Atlas / Reports / Detail
Auditing language models for hidden objectives
Alignment and Safety report from Anthropic with 13 connected researchers in the LLMpeople atlas.
Connected researchers
Amanda Askell
Anthropic / OpenAI
Amanda Askell is a philosopher and AI alignment researcher at Anthropic. Her personal site says she previously worked as a research scientist on the policy team at OpenAI.
Ethan Perez
Anthropic
Research scientist at Anthropic focused on scalable oversight, AI safety, and language model evaluation; previously worked at New York University and Google.
Samuel R. Bowman
Anthropic
Member of technical staff at Anthropic and associate professor of computer science, data science, and linguistics at New York University on leave. His public homepage focuses on natural language processing, machine learning, and AI alignment.
Nicholas Schiefer
Anthropic
Member of Technical Staff at Anthropic and cofounder of Oulipo Labs, working on language model safety, evaluations, and scientific forecasting.
Sören Mindermann
Anthropic
Research scientist at Anthropic working on machine learning and AI safety.
Henry Sleight
Anthropic
PhD student at the University of Oxford working on AI safety, including scalable oversight and interpretability.
Benjamin Lermen
Anthropic
Benjamin Lermen is listed as an author of the Anthropic technical report Auditing language models for hidden objectives.
Josh Batson
Anthropic
Josh Batson is a research scientist at Anthropic. Public descriptions of his work emphasize understanding how and why AI systems work, especially interpretability.
Chenyan Zhang
Anthropic
Chenyan Zhang is listed as an author of the Anthropic technical report Auditing language models for hidden objectives.
Scott Emmons
Anthropic
Member of Technical Staff at Anthropic working on AI control, hidden objectives, alignment, and evaluations, with a background in language models, efficient training, and scientific machine learning.
Jan Leike
Anthropic
Anthropic alignment researcher whose personal site says he leads the Alignment Science team; previously co-led OpenAI's Superalignment team and earlier worked on reinforcement learning from human feedback at DeepMind.
Owain Evans
Anthropic
Assistant Professor of Computer Science at the University of Oxford whose research spans generalization, reasoning, and large language model agents.
David Bau
Anthropic
Research scientist at Anthropic and assistant professor of computer science at Northeastern University working on interpretability and model understanding.