Co-authored Red Teaming LMs with LMs: a concrete approach to stress-testing model behavior at scale.
Researcher Profile
FeaturedNicholas Carlini
Adversarial ML, security of deployed models
Researcher at Anthropic
One of the most useful people to study if you care about what deployed models get wrong under pressure, especially around extraction, adversarial behavior, and practical security failures.
Organizations
About This Page
This profile is meant to help you get oriented quickly: why this researcher matters, what to read first, and where to explore next.
Last reviewed
March 18, 2026
Known For
The ideas, systems, and research directions that make this person worth knowing.
01
Adversarial ML and extraction risks
02
Security research on deployed models
03
Practical failure modes in modern ML systems
04
Adversarial ML, security of deployed models
05
Nicholas Carlini (site)
06
Security
Start Here
Canonical papers, project pages, or repositories that anchor this profile.
Supporting Sources
Additional links that help verify and flesh out this profile.
Related Researchers
People worth exploring next because they share topics, labs, or source material with this profile.
Co-authored Red Teaming LMs with LMs: a concrete approach to stress-testing model behavior at scale.
Co-authored Red Teaming LMs with LMs: a concrete approach to stress-testing model behavior at scale.
Co-authored Red Teaming LMs with LMs: a concrete approach to stress-testing model behavior at scale.
Co-authored Extracting Training Data from Large Language Models: a core paper on memorization and extraction risk.
Co-authored Extracting Training Data from Large Language Models: a core paper on memorization and extraction risk.