Researcher Profile

Mostofa Patwary

Model-parallel training at scale (Megatron-LM)

Co-author, Megatron-LM

Co-authored Megatron-LM: a core reference for scaling transformer training via model parallelism.

Topics

About This Page

This profile is meant to help you get oriented quickly: why this researcher matters, what to read first, and where to explore next.

Last updated

March 20, 2026

Best First Clicks

Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelismpaper Megatron-LM (GitHub)project

Known For

The ideas, systems, and research directions that make this person worth knowing.

Model-parallel training at scale (Megatron-LM)

Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism

Megatron-LM (GitHub)

Systems

Training

Start Here

Canonical papers, project pages, or repositories that anchor this profile.

Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelismpaper Megatron-LM (GitHub)project

Related Researchers

People worth exploring next because they share topics, labs, or source material with this profile.

Shared canonical source

Bryan Catanzaro

Model-parallel training at scale (Megatron-LM)

2 sources

Co-authored Megatron-LM: a core reference for scaling transformer training via model parallelism.

Systems & Infrastructure

Start HereMegatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism

Shared canonical source

Jared Casper

Model-parallel training at scale (Megatron-LM)

2 sources

Co-authored Megatron-LM: a core reference for scaling transformer training via model parallelism.

Systems & Infrastructure

Start HereMegatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism

Shared canonical source

Mohammad Shoeybi

Model-parallel training at scale (Megatron-LM)

2 sources

Co-authored Megatron-LM: a core reference for scaling transformer training via model parallelism.

Systems & Infrastructure

Start HereMegatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism

Shared canonical source

Patrick LeGresley

Model-parallel training at scale (Megatron-LM)

2 sources

Co-authored Megatron-LM: a core reference for scaling transformer training via model parallelism.

Systems & Infrastructure

Start HereMegatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism

Shared topic

Jeff Rasley

Memory-efficient distributed training (ZeRO)

1 source

Co-authored ZeRO: foundational memory optimizations for training very large models.

Systems & Infrastructure

Start HereZeRO: Memory Optimizations Toward Training Trillion Parameter Models

Shared topic

Olatunji Ruwase

Memory-efficient distributed training (ZeRO)

1 source

Co-authored ZeRO: foundational memory optimizations for training very large models.

Systems & Infrastructure

Start HereZeRO: Memory Optimizations Toward Training Trillion Parameter Models