A high-signal researcher for understanding how DeepMind approaches generality, especially in areas where reinforcement learning, multimodality, and large-scale systems meet.
Researcher Profile
Silvio Savarese
BLIP-2 and frozen-encoder multimodal LLMs
Co-author, BLIP-2
Co-authored BLIP-2: a key step toward efficient vision-language models built around LLM backbones.
About This Page
This profile is meant to help you get oriented quickly: why this researcher matters, what to read first, and where to explore next.
Last updated
March 20, 2026
Known For
The ideas, systems, and research directions that make this person worth knowing.
01
BLIP-2 and frozen-encoder multimodal LLMs
02
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
03
Multimodal
04
Vision-language
05
Vision-Language
Start Here
Canonical papers, project pages, or repositories that anchor this profile.
Related Researchers
People worth exploring next because they share topics, labs, or source material with this profile.
A foundational figure in modern sequence modeling whose work on the Transformer changed the technical direction of language and multimodal systems.
One of the more useful people to study for the Gemini era because his work spans both the text-core of multimodal frontier models and the optimization tricks that make those systems cheaper and more stable to train.
A high-signal researcher for understanding the modern scaling playbook, especially around compute-optimal training, retrieval-augmented language models, and the text side of Gemini-era multimodal systems.
Important for understanding how multilingual NLP, translation, and multimodal reasoning meet inside production-scale frontier systems rather than staying separate research tracks.
A useful name for the speech side of Google’s frontier stack, especially if you want the lineage from voice search and speech recognition systems into Gemini’s audio capabilities.