LLMpeople
Home People Organizations Reports Fields Schools
Public Atlas People first, reports as evidence, organizations as context.

Atlas / Reports / Detail

Alignment faking in large language models

Alignment and Safety report from Anthropic with 20 connected researchers in the LLMpeople atlas.

Anthropic2024-12-1820 researchers
Field
Alignment and Safety
Organization
Anthropic
arXiv
2412.14093

Canonical link

https://arxiv.org/abs/2412.14093

Connected researchers

Jared D. Kaplan portrait
Researcher 6 reports

Jared D. Kaplan

Anthropic

Jared D. Kaplan is a co-founder and Chief Science Officer at Anthropic. Anthropic's public materials also identify him as the company's Responsible Scaling Officer.

Anthropic
Ethan Perez portrait
Researcher 8 reports

Ethan Perez

Anthropic

Research scientist at Anthropic focused on scalable oversight, AI safety, and language model evaluation; previously worked at New York University and Google.

Anthropic
Samuel R. Bowman portrait
Researcher 5 reports

Samuel R. Bowman

Anthropic

Member of technical staff at Anthropic and associate professor of computer science, data science, and linguistics at New York University on leave. His public homepage focuses on natural language processing, machine learning, and AI alignment.

Anthropic
United States
Samuel Marks portrait
Researcher 6 reports

Samuel Marks

Anthropic

Senior research engineer at Anthropic interested in agent foundations, model organisms of misalignment, and human-computer interaction.

Anthropic
Evan Hubinger portrait
Researcher 2 reports

Evan Hubinger

Anthropic

Evan Hubinger is Head of Alignment Stress-Testing at Anthropic, where he works on AI safety and alignment. He previously worked at MIRI and OpenAI, studied mathematics and computer science at Harvey Mudd College, and is known for work on inner alignment, deceptive alignment, and alignment stress-testing.

Anthropic
Carson Denison portrait
Researcher 2 reports

Carson Denison

Anthropic

Member of Technical Staff at Anthropic and PhD student at Carnegie Mellon University focused on AI safety, evaluations, and oversight of large language models.

Anthropic
Monte MacDiarmid portrait
Researcher 2 reports

Monte MacDiarmid

Anthropic

Member of technical staff at Anthropic working on alignment science and the evaluation of hidden objectives in language models.

Anthropic
David Duvenaud portrait
Researcher 4 reports

David Duvenaud

Anthropic

Associate Professor at the University of Toronto whose research spans deep learning, probabilistic modeling, and machine learning methods for science and AI safety.

Anthropic
Canada
Sören Mindermann portrait
Researcher 3 reports

Sören Mindermann

Anthropic

Research scientist at Anthropic working on machine learning and AI safety.

Anthropic
Ryan Greenblatt portrait
Researcher 2 reports

Ryan Greenblatt

Anthropic

Ryan Greenblatt is chief scientist at Redwood Research. His public Redwood and Forethought profiles identify him as part of Redwood's AI safety team and say he holds a BS in applied mathematics and computer science from Brown University.

Anthropic
Buck Shlegeris portrait
Researcher 3 reports

Buck Shlegeris

Anthropic

Buck Shlegeris is a Member of Technical Staff at Anthropic whose public homepage focuses on AI safety, model evaluations, and alignment.

Anthropic
Johannes Treutlein portrait
Researcher 1 reports

Johannes Treutlein

Anthropic

Member of Technical Staff at Anthropic and researcher in neural circuits and mechanistic interpretability, building tools for understanding AI systems.

Anthropic
Jack Chen portrait
Researcher 1 reports

Jack Chen

Anthropic

Researcher at Anthropic with interests in machine learning, AI alignment, and economics.

Anthropic
Linda Petrini portrait
Researcher 1 reports

Linda Petrini

Anthropic

Research scientist at Anthropic focused on safety and robustness for language models and reinforcement learning.

Anthropic

LLMpeople is a public atlas for discovering frontier AI researchers with context, provenance, and respect.

Privacy · Terms