Co-authored SWE-bench: a key benchmark for whether models can resolve real GitHub issues.
Researcher Profile
Ofir Press
Evaluation + long-context extrapolation
Co-author, ALiBi
Co-authored SWE-bench and ALiBi: high-leverage evaluation + long-context work.
Topics
About This Page
This profile is meant to help you get oriented quickly: why this researcher matters, what to read first, and where to explore next.
Known For
The ideas, systems, and research directions that make this person worth knowing.
01
Evaluation + long-context extrapolation
02
SWE-bench: Can Language Models Resolve Real-World GitHub Issues?
03
Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation (ALiBi)
04
Evaluation
05
SWE-bench
06
Long context
Start Here
Canonical papers, project pages, or repositories that anchor this profile.
Related Researchers
People worth exploring next because they share topics, labs, or source material with this profile.
Co-authored SWE-bench: a key benchmark for whether models can resolve real GitHub issues.
Co-authored SWE-bench: a key benchmark for whether models can resolve real GitHub issues.
Co-authored SWE-bench: a key benchmark for whether models can resolve real GitHub issues.
A strong person to follow for practical language systems because his work sits right at the intersection of pretraining, retrieval, and question answering, where product-grade NLP systems either become robust or fall apart.
A key person for understanding how foundation-model evaluation, governance, and research tooling became a coherent agenda rather than a scattered set of concerns.