Researcher Profile

Editor reviewed

Matan Kalman

Faster LLM inference via speculative decoding

Researcher at Google Research working on faster inference and transformer efficiency

An important systems page because he is one of the named authors on speculative decoding, a technique that became part of the mainstream conversation about making large-model inference materially faster without changing outputs.

Organizations

Google Research

Topics

Systems & Infrastructure Agents & Reasoning

About This Page

This profile is meant to help you get oriented quickly: why this researcher matters, what to read first, and where to explore next.

Last reviewed

March 18, 2026

Best First Clicks

Fast Inference from Transformers via Speculative Decodingpaper Selective Attention Improves Transformerpaper Prompt Repetition Improves Non-Reasoning LLMspaper

Known For

The ideas, systems, and research directions that make this person worth knowing.

Speculative decoding

Faster transformer inference

Efficiency-oriented model design

Faster LLM inference via speculative decoding

Fast Inference from Transformers via Speculative Decoding

Inference

Start Here

Canonical papers, project pages, or repositories that anchor this profile.

Fast Inference from Transformers via Speculative Decodingpaper Selective Attention Improves Transformerpaper Prompt Repetition Improves Non-Reasoning LLMspaper

Signature Works

Additional papers, projects, or repositories that help flesh out the profile.

UniTune: Text-Driven Image Editing by Fine Tuning a Diffusion Model on a Single Imagepaper

Supporting Sources

Additional links that help verify and flesh out this profile.

UniTune: Text-Driven Image Editing by Fine Tuning a Diffusion Model on a Single Imagepaper

Related Researchers

People worth exploring next because they share topics, labs, or source material with this profile.

Shared canonical source

Yaniv Leviathan

Faster LLM inference via speculative decoding

4 sources

A high-signal researcher for the latency and systems side of modern language models, especially where clever decoding tricks turn frontier models into usable products.

Systems & Infrastructure

Start HereYaniv Leviathan

Shared canonical source

Yossi Matias

Faster LLM inference via speculative decoding

3 sources

Important because his profile sits at the intersection of field-level research leadership and concrete systems work such as speculative decoding that directly changed how modern LLM inference gets deployed.

Systems & Infrastructure

Start HereYossi Matias

Shared topics

Geoffrey Irving

Reasoning, verification, math

4 sources

A useful person to study if you care about alignment proposals that try to make superhuman systems legible enough for humans to supervise in practice.

Google DeepMind Multimodal Post-Training & Alignment

Start HereRed Teaming Language Models with Language Models

Shared topics

Oriol Vinyals

Sequence models, large-scale ML

4 sources

A high-signal researcher for understanding how DeepMind approaches generality, especially in areas where reinforcement learning, multimodality, and large-scale systems meet.

Google DeepMind Multimodal Systems & Infrastructure

Start HereDeepMind and Blizzard open StarCraft II as an AI research environment

Shared topics

Radu Soricut

Gemini (multimodal foundation models)

4 sources

Important for understanding how multilingual NLP, translation, and multimodal reasoning meet inside production-scale frontier systems rather than staying separate research tracks.

Multimodal Systems & Infrastructure

Start HereRadu Soricut

Shared topics

Ioannis Antonoglou

Gemini (multimodal foundation models)

4 sources

A high-signal reinforcement-learning researcher whose work sits on the path from AlphaGo-era planning systems to Gemini-era reasoning and post-training techniques.

Multimodal Post-Training & Alignment

Start HereMastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm