LLMpeople
Home People Organizations Reports Fields Schools
Public Atlas People first, reports as evidence, organizations as context.

Atlas / Reports / Detail

Constitutional Classifiers++: Defending against Universal Jailbreaks across Thousands of Hours of Red Teaming

Alignment and Safety report from Anthropic with 28 connected researchers in the LLMpeople atlas.

Anthropic2026-01-0828 researchers
Field
Alignment and Safety
Organization
Anthropic
arXiv
2601.04603

Canonical link

https://arxiv.org/abs/2601.04603

Connected researchers

Dario Amodei portrait
Researcher 5 reports

Dario Amodei

Anthropic / OpenAI

Co-founder and CEO of Anthropic.

AnthropicOpenAI
United States
Amanda Askell portrait
Researcher 7 reports

Amanda Askell

Anthropic / OpenAI

Amanda Askell is a philosopher and AI alignment researcher at Anthropic. Her personal site says she previously worked as a research scientist on the policy team at OpenAI.

AnthropicOpenAI
United States
Jack Clark portrait
Researcher 7 reports

Jack Clark

Anthropic / OpenAI

Co-founder and Head of Policy at Anthropic. His public biography also notes earlier work as Policy Director at OpenAI, a technical journalist, and author of the Import AI newsletter.

AnthropicOpenAI
Jared Kaplan portrait
Researcher 2 reports

Jared Kaplan

Anthropic

Chief Science Officer and Co-Founder of Anthropic, with public bios emphasizing scaling laws and large language models.

Anthropic
Kamal Ndousse portrait
Researcher 5 reports

Kamal Ndousse

Anthropic

Researcher at Anthropic working on alignment, reasoning, and evaluation for large language models.

Anthropic
Deep Ganguli portrait
Researcher 6 reports

Deep Ganguli

Anthropic

Research scientist at Anthropic who leads the Societal Impacts team and works on AI evaluation, alignment, and societal impacts.

Anthropic
United States
Tom Henighan portrait
Researcher 3 reports

Tom Henighan

Anthropic

Tom Henighan works on large language model interpretability at Anthropic. He previously worked on scaling laws at OpenAI and machine learning engineering at Beehive AI, and he studied physics at Stanford after graduating from Ohio State University in 2010 with a degree in English, mathematics, and philosophy.

Anthropic
Ethan Perez portrait
Researcher 8 reports

Ethan Perez

Anthropic

Research scientist at Anthropic focused on scalable oversight, AI safety, and language model evaluation; previously worked at New York University and Google.

Anthropic
Nicholas Schiefer portrait
Researcher 8 reports

Nicholas Schiefer

Anthropic

Member of Technical Staff at Anthropic and cofounder of Oulipo Labs, working on language model safety, evaluations, and scientific forecasting.

Anthropic
Samuel Marks portrait
Researcher 6 reports

Samuel Marks

Anthropic

Senior research engineer at Anthropic interested in agent foundations, model organisms of misalignment, and human-computer interaction.

Anthropic
Jan Leike portrait
Researcher 2 reports

Jan Leike

Anthropic

Anthropic alignment researcher whose personal site says he leads the Alignment Science team; previously co-led OpenAI's Superalignment team and earlier worked on reinforcement learning from human feedback at DeepMind.

Anthropic
William Saunders portrait
Researcher 2 reports

William Saunders

Anthropic

William Saunders is a research scientist at Anthropic working on aligning and evaluating language models. His public homepage says he works at the intersection of game theory, optimization, and deep learning, previously interned at OpenAI, DeepMind, and Mila, studied mathematics at the University of Oxford, and is a PhD student in machine learning at Carnegie Mellon University.

Anthropic
Alex Tamkin portrait
Researcher 3 reports

Alex Tamkin

Anthropic

Member of technical staff at Anthropic whose work focuses on language models, model understanding, and alignment.

Anthropic
Yanda Chen portrait
Researcher 2 reports

Yanda Chen

Anthropic

Yanda Chen is a member of technical staff at Anthropic and a PhD candidate in computer science at Georgetown University advised by Kevin Knight. His homepage says he previously worked at Allen Institute for AI and focuses on AI safety, natural language processing, and deep learning.

Anthropic
Beth Barnes portrait
Researcher 2 reports

Beth Barnes

Anthropic

President of METR and former team member at Anthropic whose work focuses on evaluating and forecasting frontier AI capabilities.

Anthropic
Jacob Hilton portrait
Researcher 2 reports

Jacob Hilton

Anthropic

Jacob Hilton is a researcher and executive director at Alignment Research Center, where he works on mechanistic approaches to outperforming random sampling. He previously worked at OpenAI on truthfulness, reinforcement learning, and interpretability for language models, earlier worked at Jane Street, completed a PhD in mathematics at the University of Leeds, and later coauthored Anthropic work on constitutional classifiers.

Anthropic
Dylan Hadfield-Menell portrait
Researcher 1 reports

Dylan Hadfield-Menell

Anthropic

Dylan Hadfield-Menell is a research scientist at Anthropic. His homepage says he studies how to understand and align increasingly capable AI systems, with additional interests in game theory and inverse reinforcement learning. MIT CSAIL's profile says he completed a PhD in computer science at the University of California, Berkeley advised by Stuart Russell.

Anthropic
Aengus Lynch portrait
Researcher 2 reports

Aengus Lynch

Anthropic

Aengus Lynch is a fifth-year PhD student in machine learning at Carnegie Mellon University advised by Zico Kolter. His homepage says he works on control, reinforcement learning, games, and machine learning, and his CV shows research internships at Anthropic and Google DeepMind after completing a BASc in engineering physics at the University of British Columbia.

Anthropic
Avital Oliver portrait
Researcher 1 reports

Avital Oliver

Anthropic

Avital Oliver is listed as an author of the Anthropic technical report Constitutional Classifiers++: Defending against Universal Jailbreaks across Thousands of Hours of Red Teaming.

Anthropic
Cameron Raymond portrait
Researcher 1 reports

Cameron Raymond

Anthropic

Researcher at Anthropic and coauthor of the Constitutional Classifiers++: Defending against Universal Jailbreaks across Thousands of Hours of Red Teaming.

Anthropic
Saffron Huang portrait
Researcher 1 reports

Saffron Huang

Anthropic

Saffron Huang is listed as an author of the Anthropic technical report Constitutional Classifiers++: Defending against Universal Jailbreaks across Thousands of Hours of Red Teaming.

Anthropic
Zara Ahmed portrait
Researcher 1 reports

Zara Ahmed

Anthropic

Researcher at Anthropic and coauthor of the Constitutional Classifiers++: Defending against Universal Jailbreaks across Thousands of Hours of Red Teaming.

Anthropic
Jules Christmann portrait
Researcher 1 reports

Jules Christmann

Anthropic

Jules Christmann is listed as an author of the Anthropic technical report Constitutional Classifiers++: Defending against Universal Jailbreaks across Thousands of Hours of Red Teaming.

Anthropic
Shibani Santurkar portrait
Researcher 1 reports

Shibani Santurkar

Anthropic

Shibani Santurkar is listed as an author of the Anthropic technical report Constitutional Classifiers++: Defending against Universal Jailbreaks across Thousands of Hours of Red Teaming.

Anthropic

LLMpeople is a public atlas for discovering frontier AI researchers with context, provenance, and respect.

Privacy ยท Terms