Researcher Profile

Editor reviewed

Jacob Devlin

Pretraining and representation learning for NLP

Co-author, BERT

A core name in the pretraining era of NLP, especially if you want to understand how BERT reshaped the field and how that line of work extended into broader document understanding and large-scale language systems.

Organizations

Google

Topics

Systems & Infrastructure

About This Page

This profile is meant to help you get oriented quickly: why this researcher matters, what to read first, and where to explore next.

Last reviewed

March 18, 2026

Best First Clicks

BERT: Pre-training of Deep Bidirectional Transformers for Language Understandingpaper Zero-shot Entity Linking by Reading Entity Descriptionspaper QueryForm: A Simple Zero-shot Form Entity Query Frameworkpaper

Known For

The ideas, systems, and research directions that make this person worth knowing.

BERT and bidirectional pretraining

Language representation learning at Google scale

Document understanding and retrieval-oriented NLP systems

Pretraining and representation learning for NLP

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

NLP

Start Here

Canonical papers, project pages, or repositories that anchor this profile.

Related Researchers

People worth exploring next because they share topics, labs, or source material with this profile.

Shared canonical source

Kenton Lee

NLP systems and evaluation

4 sources

A strong person to follow for practical language systems because his work sits right at the intersection of pretraining, retrieval, and question answering, where product-grade NLP systems either become robust or fall apart.

Evaluation & Benchmarks Systems & Infrastructure

Start HereKenton Lee

Shared canonical source

Kristina Toutanova

Bidirectional transformer pretraining (BERT)

1 source

Co-authored BERT: a turning point for transfer learning in NLP.

Start HereBERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Shared canonical source

Ming-Wei Chang

Bidirectional transformer pretraining (BERT)

1 source

Co-authored BERT: a turning point for transfer learning in NLP.

Start HereBERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Shared topic

Noam Shazeer

Transformers, Mixture-of-Experts, scaling

3 sources

One of the most important architecture-level thinkers in modern AI, with influence spanning Transformers, efficient scaling, and mixture-of-experts systems.

Multimodal Systems & Infrastructure

Start HereAttention Is All You Need

Shared topic

Ashish Vaswani

Transformers

3 sources

A foundational figure in modern sequence modeling whose work on the Transformer changed the technical direction of language and multimodal systems.

Multimodal Systems & Infrastructure

Start HereAttention Is All You Need

Shared topic

Niki Parmar

Transformers and sequence modeling

3 sources

A foundational transformer researcher whose work still matters because it connects the original architecture shift to later efforts on efficiency, scaling, and sequence modeling infrastructure.

Systems & Infrastructure

Start HereAttention Is All You Need