Home/Researchers/Jean-Baptiste Alayrac

Researcher Profile

Editor reviewed

Jean-Baptiste Alayrac

Gemini (multimodal foundation models)

Co-lead for multimodal vision on Gemini

One of the clearest multimodal researchers to track if you want to understand how frontier labs turned vision-language work from narrow benchmarks into general-purpose model capability.

Organizations

Google DeepMind

Topics

Multimodal Evaluation & Benchmarks

About This Page

This profile is meant to help you get oriented quickly: why this researcher matters, what to read first, and where to explore next.

Last reviewed

March 18, 2026

Best First Clicks

Gemini: A Family of Highly Capable Multimodal Modelspaper Tackling multiple tasks with a single visual language modelarticle Perceiver AR: General-purpose long-context autoregressive generationarticle

Known For

The ideas, systems, and research directions that make this person worth knowing.

Vision-language foundation models

Flamingo and multimodal few-shot learning

Long-context multimodal systems such as Gemini

Gemini (multimodal foundation models)

Gemini: A Family of Highly Capable Multimodal Models

Gemini

Start Here

Canonical papers, project pages, or repositories that anchor this profile.

Supporting Sources

Additional links that help verify and flesh out this profile.

Google DeepMindprofile

Related Researchers

People worth exploring next because they share topics, labs, or source material with this profile.

Shared canonical source

Katie Millican

Gemini (multimodal foundation models)

4 sources

Worth tracking for the data side of multimodal frontier models, where the quality and shape of training mixtures strongly determine what large systems can actually do.

Multimodal Systems & Infrastructure

Start HereGemini: A Family of Highly Capable Multimodal Models

Shared canonical source

Rohan Anil

Gemini (multimodal foundation models)

4 sources

One of the more useful people to study for the Gemini era because his work spans both the text-core of multimodal frontier models and the optimization tricks that make those systems cheaper and more stable to train.

Open Models Multimodal

Start HereGemini: A Family of Highly Capable Multimodal Models

Shared canonical source

Sebastian Borgeaud

Gemini (multimodal foundation models)

4 sources

A high-signal researcher for understanding the modern scaling playbook, especially around compute-optimal training, retrieval-augmented language models, and the text side of Gemini-era multimodal systems.

Multimodal Evaluation & Benchmarks

Start HereAn empirical analysis of compute-optimal large language model training

Shared canonical source

Jiahui Yu

Gemini (multimodal foundation models)

4 sources

A strong researcher to study for the evolution of Google’s multimodal stack from vision-language pretraining and image generation into Gemini-era foundation models.

Multimodal

Start HereSimVLM: Simple Visual Language Model Pretraining with Weak Supervision

Shared canonical source

Radu Soricut

Gemini (multimodal foundation models)

4 sources

Important for understanding how multilingual NLP, translation, and multimodal reasoning meet inside production-scale frontier systems rather than staying separate research tracks.

Multimodal Systems & Infrastructure

Start HereRadu Soricut

Shared canonical source

Andrew M. Dai

Gemini (multimodal foundation models)

4 sources

A good researcher to follow for the infrastructure side of frontier language models, especially mixture-of-experts scaling, instruction tuning, and the data systems that make very large models usable.

Multimodal Post-Training & Alignment

Start HereMore Efficient In-Context Learning with GLaM