Researcher Profile

Silvio Savarese

BLIP-2 and frozen-encoder multimodal LLMs

Co-author, BLIP-2

Co-authored BLIP-2: a key step toward efficient vision-language models built around LLM backbones.

Topics

About This Page

This profile is meant to help you get oriented quickly: why this researcher matters, what to read first, and where to explore next.

Last updated

March 20, 2026

Best First Clicks

BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Modelspaper

Known For

The ideas, systems, and research directions that make this person worth knowing.

BLIP-2 and frozen-encoder multimodal LLMs

BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models

Multimodal

Vision-language

Vision-Language

Start Here

Canonical papers, project pages, or repositories that anchor this profile.

BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Modelspaper

Related Researchers

People worth exploring next because they share topics, labs, or source material with this profile.

Shared topics

Oriol Vinyals

Sequence models, large-scale ML

4 sources

A high-signal researcher for understanding how DeepMind approaches generality, especially in areas where reinforcement learning, multimodality, and large-scale systems meet.

Google DeepMind Multimodal Systems & Infrastructure

Start HereDeepMind and Blizzard open StarCraft II as an AI research environment

Shared topics

Ashish Vaswani

Transformers

3 sources

A foundational figure in modern sequence modeling whose work on the Transformer changed the technical direction of language and multimodal systems.

Multimodal Systems & Infrastructure

Start HereAttention Is All You Need

Shared topics

Rohan Anil

Gemini (multimodal foundation models)

4 sources

One of the more useful people to study for the Gemini era because his work spans both the text-core of multimodal frontier models and the optimization tricks that make those systems cheaper and more stable to train.

Open Models Multimodal

Start HereGemini: A Family of Highly Capable Multimodal Models

Shared topics

Sebastian Borgeaud

Gemini (multimodal foundation models)

4 sources

A high-signal researcher for understanding the modern scaling playbook, especially around compute-optimal training, retrieval-augmented language models, and the text side of Gemini-era multimodal systems.

Multimodal Evaluation & Benchmarks

Start HereAn empirical analysis of compute-optimal large language model training

Shared topics

Radu Soricut

Gemini (multimodal foundation models)

4 sources

Important for understanding how multilingual NLP, translation, and multimodal reasoning meet inside production-scale frontier systems rather than staying separate research tracks.

Multimodal Systems & Infrastructure

Start HereRadu Soricut

Shared topics

Johan Schalkwyk

Gemini (multimodal foundation models)

4 sources

A useful name for the speech side of Google’s frontier stack, especially if you want the lineage from voice search and speech recognition systems into Gemini’s audio capabilities.

Multimodal Systems & Infrastructure

Start HereGoogle Search by Voice: A Case Study