Home/Researchers/Ofir Press

Researcher Profile

Ofir Press

Evaluation + long-context extrapolation

Co-author, ALiBi

Co-authored SWE-bench and ALiBi: high-leverage evaluation + long-context work.

Topics

Evaluation & Benchmarks

About This Page

This profile is meant to help you get oriented quickly: why this researcher matters, what to read first, and where to explore next.

Last updated

March 20, 2026

Best First Clicks

SWE-bench: Can Language Models Resolve Real-World GitHub Issues?paper Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation (ALiBi)paper SWE-bench (GitHub)project

Known For

The ideas, systems, and research directions that make this person worth knowing.

01

Evaluation + long-context extrapolation

02

SWE-bench: Can Language Models Resolve Real-World GitHub Issues?

03

Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation (ALiBi)

04

Evaluation

05

SWE-bench

06

Long context

Start Here

Canonical papers, project pages, or repositories that anchor this profile.

SWE-bench: Can Language Models Resolve Real-World GitHub Issues?paper Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation (ALiBi)paper SWE-bench (GitHub)project

Related Researchers

People worth exploring next because they share topics, labs, or source material with this profile.

Shared canonical source

Carlos E. Jimenez

Measuring real-world coding ability (SWE-bench)

2 sources

Co-authored SWE-bench: a key benchmark for whether models can resolve real GitHub issues.

Evaluation & Benchmarks Agents & Reasoning

Start HereSWE-bench: Can Language Models Resolve Real-World GitHub Issues?

Shared canonical source

Measuring real-world coding ability (SWE-bench)

2 sources

Co-authored SWE-bench: a key benchmark for whether models can resolve real GitHub issues.

Evaluation & Benchmarks Agents & Reasoning

Start HereSWE-bench: Can Language Models Resolve Real-World GitHub Issues?

Shared canonical source

Alexander Wettig

Measuring real-world coding ability (SWE-bench)

2 sources

Co-authored SWE-bench: a key benchmark for whether models can resolve real GitHub issues.

Evaluation & Benchmarks Agents & Reasoning

Start HereSWE-bench: Can Language Models Resolve Real-World GitHub Issues?

Shared canonical source

Measuring real-world coding ability (SWE-bench)

2 sources

Co-authored SWE-bench: a key benchmark for whether models can resolve real GitHub issues.

Evaluation & Benchmarks Agents & Reasoning

Start HereSWE-bench: Can Language Models Resolve Real-World GitHub Issues?

Shared topic

NLP systems and evaluation

4 sources

A strong person to follow for practical language systems because his work sits right at the intersection of pretraining, retrieval, and question answering, where product-grade NLP systems either become robust or fall apart.

Evaluation & Benchmarks Systems & Infrastructure

Start HereKenton Lee

Shared topic

Evaluation, robust NLP, systems

3 sources

A key person for understanding how foundation-model evaluation, governance, and research tooling became a coherent agenda rather than a scattered set of concerns.

Post-Training & Alignment Evaluation & Benchmarks

Start HerePercy Liang