Researcher Profile

Kexin Pei

Measuring real-world coding ability (SWE-bench)

Co-author, SWE-bench

Co-authored SWE-bench: a key benchmark for whether models can resolve real GitHub issues.

Topics

Evaluation & Benchmarks Agents & Reasoning

About This Page

This profile is meant to help you get oriented quickly: why this researcher matters, what to read first, and where to explore next.

Last updated

March 20, 2026

Best First Clicks

SWE-bench: Can Language Models Resolve Real-World GitHub Issues?paper SWE-bench (GitHub)project

Known For

The ideas, systems, and research directions that make this person worth knowing.

Measuring real-world coding ability (SWE-bench)

SWE-bench: Can Language Models Resolve Real-World GitHub Issues?

SWE-bench (GitHub)

Evaluation

SWE-bench

Code

Start Here

Canonical papers, project pages, or repositories that anchor this profile.

SWE-bench: Can Language Models Resolve Real-World GitHub Issues?paper SWE-bench (GitHub)project

Related Researchers

People worth exploring next because they share topics, labs, or source material with this profile.

Shared canonical source

Carlos E. Jimenez

Measuring real-world coding ability (SWE-bench)

2 sources

Co-authored SWE-bench: a key benchmark for whether models can resolve real GitHub issues.

Evaluation & Benchmarks Agents & Reasoning

Start HereSWE-bench: Can Language Models Resolve Real-World GitHub Issues?

Shared canonical source

John Yang

Measuring real-world coding ability (SWE-bench)

2 sources

Co-authored SWE-bench: a key benchmark for whether models can resolve real GitHub issues.

Evaluation & Benchmarks Agents & Reasoning

Start HereSWE-bench: Can Language Models Resolve Real-World GitHub Issues?

Shared canonical source

Alexander Wettig

Measuring real-world coding ability (SWE-bench)

2 sources

Co-authored SWE-bench: a key benchmark for whether models can resolve real GitHub issues.

Evaluation & Benchmarks Agents & Reasoning

Start HereSWE-bench: Can Language Models Resolve Real-World GitHub Issues?

Shared canonical source

Ofir Press

Evaluation + long-context extrapolation

3 sources

Co-authored SWE-bench and ALiBi: high-leverage evaluation + long-context work.

Evaluation & Benchmarks

Start HereSWE-bench: Can Language Models Resolve Real-World GitHub Issues?

Shared topics

Peter Clark

Science QA evaluation (ARC)

1 source

Co-authored ARC: an influential reasoning benchmark for question answering.

Evaluation & Benchmarks Agents & Reasoning

Start HereThink you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge

Shared topics

Isaac Cowhey

Science QA evaluation (ARC)

1 source

Co-authored ARC: an influential reasoning benchmark for question answering.

Evaluation & Benchmarks Agents & Reasoning

Start HereThink you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge