LLMpeople
Home People Organizations Reports Fields Schools
Public Atlas People first, reports as evidence, organizations as context.

Atlas / Reports / Detail

Constitutional Classifiers++: Defending against Universal Jailbreaks across Thousands of Hours of Red Teaming

Alignment and Safety

AnthropicUndated28 researchers
Field
Alignment and Safety
Organization
Anthropic
arXiv
2601.04603

Canonical link

https://arxiv.org/abs/2601.04603

Connected researchers

Profile Reports

Samuel Marks

Anthropic

Senior research engineer at Anthropic interested in agent foundations, model organisms of misalignment, and human-computer interaction.

Anthropic
Unknown 6
Profile Reports

Jack Clark

Anthropic / OpenAI

Co-founder and head of policy at Anthropic. He previously served as policy director at OpenAI, worked as a technology journalist, and writes the Import AI newsletter.

AnthropicOpenAI
Unknown 7
Profile Reports

Simon Goldstein

Anthropic

Assistant Professor of Philosophy at The University of Hong Kong and Research Fellow at Anthropic, working in ethics, epistemology, and social and political philosophy.

Anthropic
Unknown 1
Profile Reports

Amanda Askell

Anthropic / OpenAI

Alignment researcher at OpenAI working on making AI understandable to and aligned with human values.

AnthropicOpenAI
Unknown 7
Profile Reports

Kamal Ndousse

Anthropic

Researcher at Anthropic working on alignment, reasoning, and evaluation for large language models.

Anthropic
Unknown 5
Profile Reports

Jan Leike

Anthropic

Anthropic researcher focused on AI safety, alignment, and auditing hidden objectives in language models.

Anthropic
Unknown 2
Profile Reports

Ethan Perez

Anthropic

Research scientist at Anthropic focused on scalable oversight, AI safety, and language model evaluation; previously worked at New York University and Google.

Anthropic
Unknown 8
Profile Reports

Nicholas Schiefer

Anthropic

Member of Technical Staff at Anthropic and cofounder of Oulipo Labs, working on language model safety, evaluations, and scientific forecasting.

Anthropic
Unknown 8
Profile Reports

Deep Ganguli

Anthropic

Co-founder and head of alignment science at Anthropic.

Anthropic
Unknown 6
Profile Reports

Dario Amodei

Anthropic / OpenAI

CEO and co-founder of Anthropic. Before Anthropic, he served as vice president of research at OpenAI.

AnthropicOpenAI
Unknown 5
Profile Reports

Alex Tamkin

Anthropic

Member of technical staff at Anthropic whose work focuses on language models, model understanding, and alignment.

Anthropic
Unknown 3
Profile Reports

Beth Barnes

Anthropic

President of METR and former team member at Anthropic whose work focuses on evaluating and forecasting frontier AI capabilities.

Anthropic
Unknown 2
Profile Reports

Jared Kaplan

Anthropic

Jared Kaplan is a researcher at Anthropic known for work on scaling laws and large language models.

Anthropic
Unknown 2
Profile Reports

Wes Gurnee

Anthropic

Member of technical staff at Anthropic working on deep learning, mechanistic interpretability, and AI safety.

Anthropic
Unknown 1
Profile Reports

Tom Henighan

Anthropic

Profile still being enriched.

Anthropic
Unknown 3
Profile Reports

Aengus Lynch

Anthropic

Profile still being enriched.

Anthropic
Unknown 2
Profile Reports

Jacob Hilton

Anthropic

Profile still being enriched.

Anthropic
Unknown 2
Profile Reports

William Saunders

Anthropic

Profile still being enriched.

Anthropic
Unknown 2
Profile Reports

Will McCrostie

Anthropic

Profile still being enriched.

Anthropic
Unknown 2
Profile Reports

Yanda Chen

Anthropic

Profile still being enriched.

Anthropic
Unknown 2
Profile Reports

Avital Oliver

Anthropic

Profile still being enriched.

Anthropic
Unknown 1
Profile Reports

Cameron Raymond

Anthropic

Profile still being enriched.

Anthropic
Unknown 1
Profile Reports

Dylan Hadfield-Menell

Anthropic

Profile still being enriched.

Anthropic
Unknown 1
Profile Reports

Jules Christmann

Anthropic

Profile still being enriched.

Anthropic
Unknown 1

LLMpeople is a public atlas for discovering frontier AI researchers with context, provenance, and respect.