A high-signal researcher for the post-attention design space, especially if you care about the line of work trying to make linear-attention and Delta-rule models actually competitive in real language-model systems.
Researcher Profile
Editor reviewedYikang Shen
Linear transformers via the delta rule
Researcher working on efficient sequence models and multimodal RLHF
Useful because his work links two strands that usually get discussed separately: efficient sequence-model architectures on one side and multimodal alignment work on the other.
Organizations
About This Page
This profile is meant to help you get oriented quickly: why this researcher matters, what to read first, and where to explore next.
Last reviewed
March 18, 2026
Known For
The ideas, systems, and research directions that make this person worth knowing.
01
Gated linear attention and Delta-rule models
02
Multimodal RLHF and hallucination reduction
03
Research at the boundary of systems efficiency and alignment
04
Linear transformers via the delta rule
05
Parallelizing Linear Transformers with the Delta Rule over Sequence Length
06
DeltaNet
Start Here
Canonical papers, project pages, or repositories that anchor this profile.
Signature Works
Additional papers, projects, or repositories that help flesh out the profile.
Supporting Sources
Additional links that help verify and flesh out this profile.
Related Researchers
People worth exploring next because they share topics, labs, or source material with this profile.
A good page to have because he is one of the recurring names in the recent MIT line of work on linear-attention alternatives, especially where hardware-efficient training meets practical long-context sequence modeling.
Worth surfacing because he leads the Gated Slot Attention paper, which is one of the clearer attempts to push the RWKV-adjacent efficient-sequence line toward stronger memory and retrieval behavior rather than stopping at architecture novelty.
A useful researcher to study for the line from classic neural NLP into today’s efficient large-model work, with papers that span early sentence models, character-aware language modeling, and current sequence-model efficiency research.
A useful person to study if you care about alignment proposals that try to make superhuman systems legible enough for humans to supervise in practice.
A good researcher to follow for the infrastructure side of frontier language models, especially mixture-of-experts scaling, instruction tuning, and the data systems that make very large models usable.