Researcher Profile

Joseph E. Gonzalez

Fast, cheap LLM serving (PagedAttention)

Co-author, vLLM

Co-authored vLLM: a widely used serving stack for efficient LLM inference.

Topics

About This Page

This profile is meant to help you get oriented quickly: why this researcher matters, what to read first, and where to explore next.

Last updated

March 20, 2026

Best First Clicks

vLLM: Easy, Fast, and Cheap LLM Serving with PagedAttentionpaper vLLM (GitHub)project

Known For

The ideas, systems, and research directions that make this person worth knowing.

Fast, cheap LLM serving (PagedAttention)

vLLM: Easy, Fast, and Cheap LLM Serving with PagedAttention

vLLM (GitHub)

vLLM

Serving

LLM Serving

Start Here

Canonical papers, project pages, or repositories that anchor this profile.

vLLM: Easy, Fast, and Cheap LLM Serving with PagedAttentionpaper vLLM (GitHub)project

Related Researchers

People worth exploring next because they share topics, labs, or source material with this profile.

Shared canonical source

Cody Hao Yu

Fast, cheap LLM serving (PagedAttention)

2 sources

Co-authored vLLM: a widely used serving stack for efficient LLM inference.

Systems & Infrastructure

Start HerevLLM: Easy, Fast, and Cheap LLM Serving with PagedAttention

Shared canonical source

Hao Zhang

Fast, cheap LLM serving (PagedAttention)

2 sources

Co-authored vLLM: a widely used serving stack for efficient LLM inference.

Systems & Infrastructure

Start HerevLLM: Easy, Fast, and Cheap LLM Serving with PagedAttention

Shared canonical source

Ion Stoica

Fast, cheap LLM serving (PagedAttention)

2 sources

Co-authored vLLM: a widely used serving stack for efficient LLM inference.

Systems & Infrastructure

Start HerevLLM: Easy, Fast, and Cheap LLM Serving with PagedAttention

Shared canonical source

Lianmin Zheng

Fast, cheap LLM serving (PagedAttention)

2 sources

Co-authored vLLM: a widely used serving stack for efficient LLM inference.

Systems & Infrastructure

Start HerevLLM: Easy, Fast, and Cheap LLM Serving with PagedAttention

Shared canonical source

Siyuan Zhuang

Fast, cheap LLM serving (PagedAttention)

2 sources

Co-authored vLLM: a widely used serving stack for efficient LLM inference.

Systems & Infrastructure

Start HerevLLM: Easy, Fast, and Cheap LLM Serving with PagedAttention

Shared canonical source

Woosuk Kwon

Fast, cheap LLM serving (PagedAttention)

2 sources

Co-authored vLLM: a widely used serving stack for efficient LLM inference.

Systems & Infrastructure

Start HerevLLM: Easy, Fast, and Cheap LLM Serving with PagedAttention