Home/Researchers/Caiming Xiong

Researcher Profile

Caiming Xiong

Bootstrapped vision-language pretraining (BLIP)

Researcher at Salesforce

Co-authored BLIP: a high-impact recipe for unified vision-language understanding and generation.

Organizations

SalesforceChina Agricultural University

Topics

Multimodal

About This Page

This profile is meant to help you get oriented quickly: why this researcher matters, what to read first, and where to explore next.

Last updated

March 20, 2026

Best First Clicks

BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generationpaper

Official And External Links

ORCID ↗OpenAlex ↗

Known For

The ideas, systems, and research directions that make this person worth knowing.

Bootstrapped vision-language pretraining (BLIP)

BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation

Multimodal

Vision-language

Vision-Language

Start Here

Canonical papers, project pages, or repositories that anchor this profile.

BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generationpaper

Signature Works

Additional papers, projects, or repositories that help flesh out the profile.

OpenAlexprofile ORCIDprofile

Supporting Sources

Additional links that help verify and flesh out this profile.

OpenAlexprofile ORCIDprofile

Related Researchers

People worth exploring next because they share topics, labs, or source material with this profile.

Shared canonical source

Dongxu Li

Bootstrapped vision-language pretraining (BLIP)

1 source

Co-authored BLIP: a high-impact recipe for unified vision-language understanding and generation.

Multimodal

Start HereBLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation

Shared canonical source

Junnan Li

Bootstrapped vision-language pretraining (BLIP)

1 source

Co-authored BLIP: a high-impact recipe for unified vision-language understanding and generation.

Multimodal

Start HereBLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation

Shared canonical source

Steven Hoi

Bootstrapped vision-language pretraining (BLIP)

1 source

Co-authored BLIP: a high-impact recipe for unified vision-language understanding and generation.

Multimodal

Start HereBLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation

Shared topic

Aditya Ramesh

Vision-language pretraining (CLIP)

1 source

Co-authored CLIP: a core reference for contrastive multimodal pretraining.

OpenAI Multimodal

Start HereLearning Transferable Visual Models From Natural Language Supervision

Shared topic

Chris Hallacy

Vision-language pretraining (CLIP)

1 source

Co-authored CLIP: a core reference for contrastive multimodal pretraining.

OpenAI Multimodal

Start HereLearning Transferable Visual Models From Natural Language Supervision

Shared topic

Gabriel Goh

Vision-language pretraining (CLIP)

1 source

Co-authored CLIP: a core reference for contrastive multimodal pretraining.

OpenAI Multimodal

Start HereLearning Transferable Visual Models From Natural Language Supervision