Atlas / Reports / Detail
Constitutional Classifiers: Defending against Universal Jailbreaks across Thousands of Hours of Red Teaming
Alignment and Safety
Connected researchers
Liane Lovitt
Anthropic
Research scientist at Anthropic whose public work includes AI alignment, reinforcement learning from human feedback, and model behavior.
Samuel Marks
Anthropic
Senior research engineer at Anthropic interested in agent foundations, model organisms of misalignment, and human-computer interaction.
Ethan Perez
Anthropic
Research scientist at Anthropic focused on scalable oversight, AI safety, and language model evaluation; previously worked at New York University and Google.
Nicholas Schiefer
Anthropic
Member of Technical Staff at Anthropic and cofounder of Oulipo Labs, working on language model safety, evaluations, and scientific forecasting.
Alex Tamkin
Anthropic
Member of technical staff at Anthropic whose work focuses on language models, model understanding, and alignment.
Beth Barnes
Anthropic
President of METR and former team member at Anthropic whose work focuses on evaluating and forecasting frontier AI capabilities.
Alexey Nazarov
Anthropic
Member of technical staff at Anthropic focused on safe and reliable AI.
Jacob Hilton
Anthropic
Profile still being enriched.
William Saunders
Anthropic
Profile still being enriched.
Yanda Chen
Anthropic
Profile still being enriched.
Jordan Taylor
Anthropic
Profile still being enriched.
Maxwell Tegmark
Anthropic
Profile still being enriched.
Tomás Riofrío
Anthropic
Profile still being enriched.