Co-authored Deep RL from Human Preferences: an early anchor for RLHF-style post-training.
Researcher Profile
FeaturedPaul Christiano
Alignment theory, reward modeling
Founder at Alignment Research Center
A foundational thinker in oversight, reward modeling, and delegation-style alignment ideas that influenced much of the modern post-training conversation.
Organizations
About This Page
This profile is meant to help you get oriented quickly: why this researcher matters, what to read first, and where to explore next.
Last reviewed
March 18, 2026
Known For
The ideas, systems, and research directions that make this person worth knowing.
01
Reward modeling
02
Scalable oversight and delegation
03
Foundational alignment theory for advanced systems
04
Alignment theory, reward modeling
05
Deep Reinforcement Learning from Human Preferences
06
Alignment
Start Here
Canonical papers, project pages, or repositories that anchor this profile.
Supporting Sources
Additional links that help verify and flesh out this profile.
Related Researchers
People worth exploring next because they share topics, labs, or source material with this profile.
Co-authored Deep RL from Human Preferences: an early anchor for RLHF-style post-training.
Co-authored Deep RL from Human Preferences: an early anchor for RLHF-style post-training.
Co-authored an early RLHF recipe for helpful + harmless assistants.
A high-signal figure for understanding the frontier model era because his work sits at the intersection of scaling, post-training, and deployment-risk framing.
A high-signal researcher for understanding how post-training and behavioral steering become concrete product behavior rather than abstract alignment talk.