Home/Researchers/Sainbayar Sukhbaatar

Researcher Profile

Sainbayar Sukhbaatar

Self-rewarding post-training

Co-author, Self-Rewarding LMs

Co-authored Self-Rewarding Language Models: explores self-improvement via internal reward modeling.

About This Page

This profile is meant to help you get oriented quickly: why this researcher matters, what to read first, and where to explore next.

Last updated

March 20, 2026

Known For

The ideas, systems, and research directions that make this person worth knowing.

01

Self-rewarding post-training

02

Self-Rewarding Language Models

03

Post-training

04

Alignment

05

Preference Optimization

Start Here

Canonical papers, project pages, or repositories that anchor this profile.

Related Researchers

People worth exploring next because they share topics, labs, or source material with this profile.