Researcher Profile

Editor reviewed

Tri Dao

Efficient sequence models + attention kernels

Assistant professor at Princeton and chief scientist of Together AI

One of the clearest researchers to follow for efficient sequence-model systems, especially the line of work that made frontier training and inference materially faster rather than merely cleaner on paper.

Organizations

Princeton UniversityTogether AI

Topics

Systems & Infrastructure

About This Page

This profile is meant to help you get oriented quickly: why this researcher matters, what to read first, and where to explore next.

Last reviewed

March 18, 2026

Best First Clicks

Tri Daoprofile FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awarenesspaper Mamba: Linear-Time Sequence Modeling with Selective State Spacespaper

Known For

The ideas, systems, and research directions that make this person worth knowing.

FlashAttention

Mamba and selective state spaces

Systems-aware model design for efficient training and inference

Efficient sequence models + attention kernels

FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness

FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning

Start Here

Canonical papers, project pages, or repositories that anchor this profile.

Tri Daoprofile FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awarenesspaper Mamba: Linear-Time Sequence Modeling with Selective State Spacespaper FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioningpaper

Related Researchers

People worth exploring next because they share topics, labs, or source material with this profile.

Shared canonical source

Albert Gu

State space models for sequence modeling

4 sources

A key researcher for understanding why state-space models became a serious alternative to standard transformer stacks rather than a recurring side path.

Systems & Infrastructure

Start HereAlbert Gu

Shared canonical source

Daniel Y. Fu

Fast, memory-efficient attention

4 sources

One of the more useful people to follow for the systems side of modern model building, especially where better kernels and sequence methods translate directly into frontier-model training and inference speed.

Systems & Infrastructure

Start HereResearch | Together AI

Shared canonical source

Atri Rudra

Fast, memory-efficient attention

4 sources

Worth following because he brings a real theory background into the model-systems layer, especially where structured linear algebra and sequence methods end up mattering for practical modern architectures.

Systems & Infrastructure

Start HereAtri Rudra

Shared canonical source

Christopher Ré

Fast, memory-efficient attention

4 sources

Important because he sits at a productive seam between machine learning, data systems, and model infrastructure, with work that ranges from weak supervision to some of the most important efficiency breakthroughs in modern training stacks.

Systems & Infrastructure

Start HereHomepage of Christopher Re (Chris Re)

Shared topic

Koray Kavukcuoglu

Large-scale training, systems

4 sources

A high-signal figure for understanding how DeepMind turned ambitious research systems into durable products, especially across reinforcement learning, speech, and code generation.

Google DeepMind Multimodal Systems & Infrastructure

Start HereWaveNet

Shared topic

Jeff Dean

ML systems, large-scale infrastructure

4 sources

Foundational less for any single public paper than for shaping the infrastructure, engineering culture, and systems thinking that make frontier-model research possible.

Google Multimodal Systems & Infrastructure

Start HereMapReduce: Simplified Data Processing on Large Clusters