A key researcher for understanding why state-space models became a serious alternative to standard transformer stacks rather than a recurring side path.
Researcher Profile
Editor reviewedTri Dao
Efficient sequence models + attention kernels
Assistant professor at Princeton and chief scientist of Together AI
One of the clearest researchers to follow for efficient sequence-model systems, especially the line of work that made frontier training and inference materially faster rather than merely cleaner on paper.
Organizations
Topics
About This Page
This profile is meant to help you get oriented quickly: why this researcher matters, what to read first, and where to explore next.
Last reviewed
March 18, 2026
Known For
The ideas, systems, and research directions that make this person worth knowing.
01
FlashAttention
02
Mamba and selective state spaces
03
Systems-aware model design for efficient training and inference
04
Efficient sequence models + attention kernels
05
FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness
06
FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning
Start Here
Canonical papers, project pages, or repositories that anchor this profile.
Related Researchers
People worth exploring next because they share topics, labs, or source material with this profile.
One of the more useful people to follow for the systems side of modern model building, especially where better kernels and sequence methods translate directly into frontier-model training and inference speed.
Worth following because he brings a real theory background into the model-systems layer, especially where structured linear algebra and sequence methods end up mattering for practical modern architectures.
Important because he sits at a productive seam between machine learning, data systems, and model infrastructure, with work that ranges from weak supervision to some of the most important efficiency breakthroughs in modern training stacks.
A high-signal figure for understanding how DeepMind turned ambitious research systems into durable products, especially across reinforcement learning, speech, and code generation.
Foundational less for any single public paper than for shaping the infrastructure, engineering culture, and systems thinking that make frontier-model research possible.