A strong systems page because his work repeatedly shows up where inference efficiency meets usable long context, especially in attention sinks, StreamingLLM, post-training quantization, and later long-context head designs.
Researcher Profile
Editor reviewedMike Lewis
Streaming + long-context stability (attention sinks)
Researcher behind BART, retrieval-augmented generation, and long-context language-model work
A strong person to study for the modern NLP stack because his work spans denoising pretraining, retrieval-augmented generation, and later long-context inference tricks rather than only one phase of the language-model pipeline.
Also Known As
Organizations
About This Page
This profile is meant to help you get oriented quickly: why this researcher matters, what to read first, and where to explore next.
Known For
The ideas, systems, and research directions that make this person worth knowing.
01
BART and sequence-to-sequence pretraining
02
Retrieval-augmented generation
03
Long-context language-model inference
04
Streaming + long-context stability (attention sinks)
05
Efficient Streaming Language Models with Attention Sinks
06
Long context
Start Here
Canonical papers, project pages, or repositories that anchor this profile.
Related Researchers
People worth exploring next because they share topics, labs, or source material with this profile.
A high-signal researcher for the systems side of modern AI, especially where reinforcement learning, memory-efficient large-model training, and long-context inference meet.
A strong researcher to follow for efficient and long-context LLM systems, especially where inference tricks and memory management make large models practical to run.
One of the clearest researchers to follow for efficient AI systems, especially the line of work that makes large models smaller, faster, and easier to deploy without giving up too much quality.
A strong person to follow if you care about open-weight language models and retrieval-heavy NLP systems, especially the line from RoBERTa and RAG into LLaMA-era model development.
A stronger page than the old stub because his work cuts across two important threads in modern language models: early retrieval-augmented generation systems like Atlas and the later LLaMA open-weight model line.