LLMpeople
Home People Organizations Reports Fields Schools
Public Atlas People first, reports as evidence, organizations as context.

Atlas / Reports / Detail

Constitutional Classifiers: Defending against Universal Jailbreaks across Thousands of Hours of Red Teaming

Alignment and Safety

AnthropicUndated13 researchers
Field
Alignment and Safety
Organization
Anthropic
arXiv
2501.18837

Canonical link

https://arxiv.org/abs/2501.18837

Connected researchers

Profile Reports

Liane Lovitt

Anthropic

Research scientist at Anthropic whose public work includes AI alignment, reinforcement learning from human feedback, and model behavior.

Anthropic
Unknown 2
Profile Reports

Samuel Marks

Anthropic

Senior research engineer at Anthropic interested in agent foundations, model organisms of misalignment, and human-computer interaction.

Anthropic
Unknown 6
Profile Reports

Ethan Perez

Anthropic

Research scientist at Anthropic focused on scalable oversight, AI safety, and language model evaluation; previously worked at New York University and Google.

Anthropic
Unknown 8
Profile Reports

Nicholas Schiefer

Anthropic

Member of Technical Staff at Anthropic and cofounder of Oulipo Labs, working on language model safety, evaluations, and scientific forecasting.

Anthropic
Unknown 8
Profile Reports

Alex Tamkin

Anthropic

Member of technical staff at Anthropic whose work focuses on language models, model understanding, and alignment.

Anthropic
Unknown 3
Profile Reports

Beth Barnes

Anthropic

President of METR and former team member at Anthropic whose work focuses on evaluating and forecasting frontier AI capabilities.

Anthropic
Unknown 2
Profile Reports

Alexey Nazarov

Anthropic

Member of technical staff at Anthropic focused on safe and reliable AI.

Anthropic
Unknown 1
Profile Reports

Jacob Hilton

Anthropic

Profile still being enriched.

Anthropic
Unknown 2
Profile Reports

William Saunders

Anthropic

Profile still being enriched.

Anthropic
Unknown 2
Profile Reports

Yanda Chen

Anthropic

Profile still being enriched.

Anthropic
Unknown 2
Profile Reports

Jordan Taylor

Anthropic

Profile still being enriched.

Anthropic
Unknown 1
Profile Reports

Maxwell Tegmark

Anthropic

Profile still being enriched.

Anthropic
Unknown 1
Profile Reports

Tomás Riofrío

Anthropic

Profile still being enriched.

Anthropic
Unknown 1

LLMpeople is a public atlas for discovering frontier AI researchers with context, provenance, and respect.