Amanda Askell portrait
Researcher 7 reports

Amanda Askell

Anthropic / OpenAI

Amanda Askell is a philosopher and AI alignment researcher at Anthropic. Her personal site says she previously worked as a research scientist on the policy team at OpenAI.

Deep Ganguli portrait
Researcher 6 reports

Deep Ganguli

Anthropic

Research scientist at Anthropic who leads the Societal Impacts team and works on AI evaluation, alignment, and societal impacts.

Tom Henighan portrait
Researcher 3 reports

Tom Henighan

Anthropic

Tom Henighan works on large language model interpretability at Anthropic. He previously worked on scaling laws at OpenAI and machine learning engineering at Beehive AI, and he studied physics at Stanford after graduating from Ohio State University in 2010 with a degree in English, mathematics, and philosophy.

Aengus Lynch portrait
Researcher 2 reports

Aengus Lynch

Anthropic

Aengus Lynch is a fifth-year PhD student in machine learning at Carnegie Mellon University advised by Zico Kolter. His homepage says he works on control, reinforcement learning, games, and machine learning, and his CV shows research internships at Anthropic and Google DeepMind after completing a BASc in engineering physics at the University of British Columbia.

Evan Hubinger portrait
Researcher 2 reports

Evan Hubinger

Anthropic

Evan Hubinger is Head of Alignment Stress-Testing at Anthropic, where he works on AI safety and alignment. He previously worked at MIRI and OpenAI, studied mathematics and computer science at Harvey Mudd College, and is known for work on inner alignment, deceptive alignment, and alignment stress-testing.

Jacob Hilton portrait
Researcher 2 reports

Jacob Hilton

Anthropic

Jacob Hilton is a researcher and executive director at Alignment Research Center, where he works on mechanistic approaches to outperforming random sampling. He previously worked at OpenAI on truthfulness, reinforcement learning, and interpretability for language models, earlier worked at Jane Street, completed a PhD in mathematics at the University of Leeds, and later coauthored Anthropic work on constitutional classifiers.

Sandipan Kundu portrait
Researcher 2 reports

Sandipan Kundu

Anthropic

Sandipan Kundu is a member of technical staff at Anthropic. His public The Org profile says he previously held postdoctoral positions at Johns Hopkins and Cornell, worked and studied at the University of Texas at Austin, and earned a master's degree in physics from the Indian Institute of Technology Kanpur.

William Saunders portrait
Researcher 2 reports

William Saunders

Anthropic

William Saunders is a research scientist at Anthropic working on aligning and evaluating language models. His public homepage says he works at the intersection of game theory, optimization, and deep learning, previously interned at OpenAI, DeepMind, and Mila, studied mathematics at the University of Oxford, and is a PhD student in machine learning at Carnegie Mellon University.

Cem Anil portrait
Researcher 1 reports

Cem Anil

Anthropic

Cem Anil is a research scientist at Anthropic and part of the company's Alignment Science team. His homepage says he recently completed a PhD at the University of Toronto and Vector Institute supervised by Roger Grosse and Geoffrey Hinton. He studies the intersection of deep learning and AI safety, especially robustness and generalization in large language models and scaling laws for dangerous capabilities.

Saurav Kadavath portrait
Researcher 4 reports

Saurav Kadavath

Anthropic

Researcher at Anthropic whose public report authorships and scholarly profiles cover language model evaluation, AI safety, and robustness.

Ryan Greenblatt portrait
Researcher 2 reports

Ryan Greenblatt

Anthropic

Ryan Greenblatt is chief scientist at Redwood Research. His public Redwood and Forethought profiles identify him as part of Redwood's AI safety team and say he holds a BS in applied mathematics and computer science from Brown University.

Liane Lovitt portrait
Researcher 2 reports

Liane Lovitt

Anthropic

Research scientist at Anthropic whose public work includes AI alignment, reinforcement learning from human feedback, and model behavior.

Samuel Marks portrait
Researcher 6 reports

Samuel Marks

Anthropic

Senior research engineer at Anthropic interested in agent foundations, model organisms of misalignment, and human-computer interaction.

Yanda Chen portrait
Researcher 2 reports

Yanda Chen

Anthropic

Yanda Chen is a member of technical staff at Anthropic and a PhD candidate in computer science at Georgetown University advised by Kevin Knight. His homepage says he previously worked at Allen Institute for AI and focuses on AI safety, natural language processing, and deep learning.

Dan Hendrycks portrait
Researcher 1 reports

Dan Hendrycks

Anthropic

Dan Hendrycks is the executive director of the Center for AI Safety and an advisor to xAI and Scale AI. His public homepage also says he received his PhD in AI from UC Berkeley and highlights contributions including GELU, robustness benchmarks, and MMLU.

Dylan Hadfield-Menell portrait
Researcher 1 reports

Dylan Hadfield-Menell

Anthropic

Dylan Hadfield-Menell is a research scientist at Anthropic. His homepage says he studies how to understand and align increasingly capable AI systems, with additional interests in game theory and inverse reinforcement learning. MIT CSAIL's profile says he completed a PhD in computer science at the University of California, Berkeley advised by Stuart Russell.

Kamal Ndousse portrait
Researcher 5 reports

Kamal Ndousse

Anthropic

Researcher at Anthropic working on alignment, reasoning, and evaluation for large language models.

Jack Clark portrait
Researcher 7 reports

Jack Clark

Anthropic / OpenAI

Co-founder and Head of Policy at Anthropic. His public biography also notes earlier work as Policy Director at OpenAI, a technical journalist, and author of the Import AI newsletter.

Newton Cheng portrait
Researcher 1 reports

Newton Cheng

Anthropic

Anthropic researcher on the Frontier Red Team focused on cyber misuse evaluation and threat modeling; previously a physics PhD student at UC Berkeley and now also mentors in the MATS program.

Jan Leike portrait
Researcher 2 reports

Jan Leike

Anthropic

Anthropic alignment researcher whose personal site says he leads the Alignment Science team; previously co-led OpenAI's Superalignment team and earlier worked on reinforcement learning from human feedback at DeepMind.

Samuel R. Bowman portrait
Researcher 5 reports

Samuel R. Bowman

Anthropic

Member of technical staff at Anthropic and associate professor of computer science, data science, and linguistics at New York University on leave. His public homepage focuses on natural language processing, machine learning, and AI alignment.

David Duvenaud portrait
Researcher 4 reports

David Duvenaud

Anthropic

Associate Professor at the University of Toronto whose research spans deep learning, probabilistic modeling, and machine learning methods for science and AI safety.

Ethan Perez portrait
Researcher 8 reports

Ethan Perez

Anthropic

Research scientist at Anthropic focused on scalable oversight, AI safety, and language model evaluation; previously worked at New York University and Google.

Nicholas Schiefer portrait
Researcher 8 reports

Nicholas Schiefer

Anthropic

Member of Technical Staff at Anthropic and cofounder of Oulipo Labs, working on language model safety, evaluations, and scientific forecasting.

Dario Amodei portrait
Researcher 5 reports

Dario Amodei

Anthropic / OpenAI

Co-founder and CEO of Anthropic.

Shauna Kravec portrait
Researcher 3 reports

Shauna Kravec

Anthropic

Researcher focused on AI safety, reinforcement learning, and language models, with public work spanning red teaming, adversarial robustness, and model behavior.

Nova DasSarma portrait
Researcher 5 reports

Nova DasSarma

Anthropic

Anthropic report author whose public publication record includes work on language model evaluations, AI safety, and model behavior.

Tom Conerly portrait
Researcher 4 reports

Tom Conerly

Anthropic

Anthropic report author whose public publication record includes work on language model calibration, interpretability, and AI safety.

Simon Goldstein portrait
Researcher 1 reports

Simon Goldstein

Anthropic

Assistant Professor of Philosophy at The University of Hong Kong and Research Fellow at Anthropic, working in ethics, epistemology, and social and political philosophy.

Jesse Mu portrait
Researcher 1 reports

Jesse Mu

Anthropic

Jesse Mu is a Research Scientist at Anthropic and a visiting researcher at Stanford University. His work spans machine learning, AI safety, reinforcement learning, and deep learning theory.

Linda Petrini portrait
Researcher 1 reports

Linda Petrini

Anthropic

Research scientist at Anthropic focused on safety and robustness for language models and reinforcement learning.

Roger Grosse portrait
Researcher 1 reports

Roger Grosse

Anthropic

Associate Professor of Computer Science at the University of Toronto and director of the machine learning group, with research spanning probabilistic models and optimization algorithms.

Jared D. Kaplan portrait
Researcher 6 reports

Jared D. Kaplan

Anthropic

Jared D. Kaplan is a co-founder and Chief Science Officer at Anthropic. Anthropic's public materials also identify him as the company's Responsible Scaling Officer.

Yuntao Bai portrait
Researcher 4 reports

Yuntao Bai

Anthropic

Anthropic researcher whose work includes reinforcement learning from human feedback and Constitutional AI; previously a Sherman Fairchild Postdoctoral Scholar in theoretical high-energy physics at Caltech.

David Bau portrait
Researcher 3 reports

David Bau

Anthropic

Research scientist at Anthropic and assistant professor of computer science at Northeastern University working on interpretability and model understanding.

Anna Chen portrait
Researcher 4 reports

Anna Chen

Anthropic

Anthropic report author listed on RLHF, Constitutional AI, Collective Constitutional AI, and Many-shot Jailbreaking reports, with report-backed work on alignment and adversarial evaluation.

Josh Batson portrait
Researcher 2 reports

Josh Batson

Anthropic

Josh Batson is a research scientist at Anthropic. Public descriptions of his work emphasize understanding how and why AI systems work, especially interpretability.

Jared Kaplan portrait
Researcher 2 reports

Jared Kaplan

Anthropic

Chief Science Officer and Co-Founder of Anthropic, with public bios emphasizing scaling laws and large language models.

Carina Kauf portrait
Researcher 1 reports

Carina Kauf

Anthropic

Member of Anthropic's Societal Impacts team, where she studies the real-world impacts of AI systems.

Sören Mindermann portrait
Researcher 3 reports

Sören Mindermann

Anthropic

Research scientist at Anthropic working on machine learning and AI safety.

Henry Sleight portrait
Researcher 1 reports

Henry Sleight

Anthropic

PhD student at the University of Oxford working on AI safety, including scalable oversight and interpretability.

Jack Chen portrait
Researcher 1 reports

Jack Chen

Anthropic

Researcher at Anthropic with interests in machine learning, AI alignment, and economics.

Kshitij Sachan portrait
Researcher 1 reports

Kshitij Sachan

Anthropic

Kshitij Sachan is a research scientist at Anthropic whose public homepage and Google Scholar profile highlight work on language models, reasoning, code generation, and machine learning systems.

Michael Sellitto portrait
Researcher 1 reports

Michael Sellitto

Anthropic

Research scientist at Anthropic working on trustworthy AI and deceptive alignment.

Mrinank Sharma portrait
Researcher 1 reports

Mrinank Sharma

Anthropic

AI safety researcher who led Anthropic's Safeguards Research Team and worked on jailbreak robustness, automated red teaming, and monitoring for misuse and misalignment.

Zachary Witten portrait
Researcher 1 reports

Zachary Witten

Anthropic

Zachary Witten is a member of technical staff at Anthropic.

Alex Tamkin portrait
Researcher 3 reports

Alex Tamkin

Anthropic

Member of technical staff at Anthropic whose work focuses on language models, model understanding, and alignment.

Buck Shlegeris portrait
Researcher 3 reports

Buck Shlegeris

Anthropic

Buck Shlegeris is a Member of Technical Staff at Anthropic whose public homepage focuses on AI safety, model evaluations, and alignment.

Will McCrostie portrait
Researcher 2 reports

Will McCrostie

Anthropic

Will McCrostie is listed as an author of the Anthropic technical report Tracing the thoughts of a large language model.

Cameron Raymond portrait
Researcher 1 reports

Cameron Raymond

Anthropic

Researcher at Anthropic and coauthor of the Constitutional Classifiers++: Defending against Universal Jailbreaks across Thousands of Hours of Red Teaming.

Maxwell Tegmark portrait
Researcher 1 reports

Maxwell Tegmark

Anthropic

Researcher at Anthropic and coauthor of the Constitutional Classifiers report.

Valentin Meksyn portrait
Researcher 1 reports

Valentin Meksyn

Anthropic

Researcher at Anthropic and coauthor of the Constitutional Classifiers++: Defending against Universal Jailbreaks across Thousands of Hours of Red Teaming.

Zara Ahmed portrait
Researcher 1 reports

Zara Ahmed

Anthropic

Researcher at Anthropic and coauthor of the Constitutional Classifiers++: Defending against Universal Jailbreaks across Thousands of Hours of Red Teaming.

Beth Barnes portrait
Researcher 2 reports

Beth Barnes

Anthropic

President of METR and former team member at Anthropic whose work focuses on evaluating and forecasting frontier AI capabilities.

Carson Denison portrait
Researcher 2 reports

Carson Denison

Anthropic

Member of Technical Staff at Anthropic and PhD student at Carnegie Mellon University focused on AI safety, evaluations, and oversight of large language models.

Monte MacDiarmid portrait
Researcher 2 reports

Monte MacDiarmid

Anthropic

Member of technical staff at Anthropic working on alignment science and the evaluation of hidden objectives in language models.

Avital Oliver portrait
Researcher 1 reports

Avital Oliver

Anthropic

Avital Oliver is listed as an author of the Anthropic technical report Constitutional Classifiers++: Defending against Universal Jailbreaks across Thousands of Hours of Red Teaming.

Benjamin Lermen portrait
Researcher 1 reports

Benjamin Lermen

Anthropic

Benjamin Lermen is listed as an author of the Anthropic technical report Auditing language models for hidden objectives.

Chenyan Zhang portrait
Researcher 1 reports

Chenyan Zhang

Anthropic

Chenyan Zhang is listed as an author of the Anthropic technical report Auditing language models for hidden objectives.

Daniel M. Ziegler portrait
Researcher 1 reports

Daniel M. Ziegler

Anthropic

Daniel M. Ziegler is listed as an author of the Anthropic technical report Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training.

David McDougall portrait
Researcher 1 reports

David McDougall

Anthropic

David McDougall is listed as an author of the Anthropic technical report Many-shot Jailbreaking.

Jordan Taylor portrait
Researcher 1 reports

Jordan Taylor

Anthropic

Jordan Taylor is listed as an author of the Anthropic technical report Constitutional Classifiers: Defending against Universal Jailbreaks across Thousands of Hours of Red Teaming.

Jules Christmann portrait
Researcher 1 reports

Jules Christmann

Anthropic

Jules Christmann is listed as an author of the Anthropic technical report Constitutional Classifiers++: Defending against Universal Jailbreaks across Thousands of Hours of Red Teaming.

Mantas Mazeika portrait
Researcher 1 reports

Mantas Mazeika

Anthropic

Mantas Mazeika is listed as an author of the Anthropic technical report Many-shot Jailbreaking.

Saffron Huang portrait
Researcher 1 reports

Saffron Huang

Anthropic

Saffron Huang is listed as an author of the Anthropic technical report Constitutional Classifiers++: Defending against Universal Jailbreaks across Thousands of Hours of Red Teaming.

Shibani Santurkar portrait
Researcher 1 reports

Shibani Santurkar

Anthropic

Shibani Santurkar is listed as an author of the Anthropic technical report Constitutional Classifiers++: Defending against Universal Jailbreaks across Thousands of Hours of Red Teaming.

Tomás Riofrío portrait
Researcher 1 reports

Tomás Riofrío

Anthropic

Tomás Riofrío is listed as an author of the Anthropic technical report Constitutional Classifiers: Defending against Universal Jailbreaks across Thousands of Hours of Red Teaming.

Adam Jermyn portrait
Researcher 1 reports

Adam Jermyn

Anthropic

Research scientist at Anthropic and former professor of theoretical astrophysics at Stony Brook University.

Alexey Nazarov portrait
Researcher 1 reports

Alexey Nazarov

Anthropic

Member of technical staff at Anthropic focused on safe and reliable AI.

Esin Durmus portrait
Researcher 1 reports

Esin Durmus

Anthropic

Assistant professor of marketing at Stanford Graduate School of Business whose research uses AI systems to study human decision-making and related machine learning questions.

Holden Karnofsky portrait
Researcher 1 reports

Holden Karnofsky

Anthropic

Co-founder and president of Anthropic and writer of the Cold Takes blog.

Jan Brauner portrait
Researcher 1 reports

Jan Brauner

Anthropic

Computer scientist at Anthropic focused on making advanced AI systems safe and beneficial.

Johannes Treutlein portrait
Researcher 1 reports

Johannes Treutlein

Anthropic

Member of Technical Staff at Anthropic and researcher in neural circuits and mechanistic interpretability, building tools for understanding AI systems.

Owain Evans portrait
Researcher 1 reports

Owain Evans

Anthropic

Assistant Professor of Computer Science at the University of Oxford whose research spans generalization, reasoning, and large language model agents.

Paul Christiano portrait
Researcher 1 reports

Paul Christiano

Anthropic

Researcher focused on AI alignment, reasoning under uncertainty, and the long-term safety of advanced AI systems.

Rylan Schaeffer portrait
Researcher 1 reports

Rylan Schaeffer

Anthropic

Research scientist at Anthropic focused on AI alignment, language model behavior, and scalable oversight.

Scott Emmons portrait
Researcher 1 reports

Scott Emmons

Anthropic

Member of Technical Staff at Anthropic working on AI control, hidden objectives, alignment, and evaluations, with a background in language models, efficient training, and scientific machine learning.

Wes Gurnee portrait
Researcher 1 reports

Wes Gurnee

Anthropic

Member of technical staff at Anthropic working on deep learning, mechanistic interpretability, and AI safety.