Constitutional AI: Harmlessness from AI Feedback

Co-founder and CEO of Anthropic.

Amanda Askell is a philosopher and AI alignment researcher at Anthropic. Her personal site says she previously worked as a research scientist on the policy team at OpenAI.

Co-founder and Head of Policy at Anthropic. His public biography also notes earlier work as Policy Director at OpenAI, a technical journalist, and author of the Import AI newsletter.

Anthropic researcher whose work includes reinforcement learning from human feedback and Constitutional AI; previously a Sherman Fairchild Postdoctoral Scholar in theoretical high-energy physics at Caltech.

Researcher at Anthropic working on alignment, reasoning, and evaluation for large language models.

Anthropic report author listed on RLHF, Constitutional AI, Collective Constitutional AI, and Many-shot Jailbreaking reports, with report-backed work on alignment and adversarial evaluation.

Anthropic report author whose public publication record includes work on language model evaluations, AI safety, and model behavior.

Dawn Drain is an Anthropic-affiliated researcher in the United States. Public sources list her as a coauthor of Anthropic's helpful and harmless assistant paper and show 2022 software engineering publication credits including work on code completion and automated repair with large language models.

Member of Technical Staff at Anthropic whose work focuses on understanding, evaluating, and improving large language models, with emphasis on reasoning, safety, and generalization.

Research scientist at Anthropic who leads the Societal Impacts team and works on AI evaluation, alignment, and societal impacts.

Tom Henighan works on large language model interpretability at Anthropic. He previously worked on scaling laws at OpenAI and machine learning engineering at Beehive AI, and he studied physics at Stanford after graduating from Ohio State University in 2010 with a degree in English, mathematics, and philosophy.

Researcher at Anthropic working on the alignment and evaluation of advanced AI systems.

Researcher at Anthropic whose public report authorships and scholarly profiles cover language model evaluation, AI safety, and robustness.

Member of Anthropic's Interpretability team, where he works on understanding how large language models work.

Anthropic report author whose public publication record includes work on language model calibration, interpretability, and AI safety.

Sheer El-Showk is a member of technical staff at Anthropic. His public The Org profile says he is also CTO at Lore AI and founder of Nascent AI, previously worked as a senior software engineer at Coiled and held postdoctoral research fellowships at CERN and the CEA, and earned physics and mathematical physics degrees from UC Berkeley and the University of Amsterdam.

Nelson Elhage is an engineer and researcher at Anthropic, where he works on the pretraining team after earlier work on reverse-engineering large language models. He previously worked at Stripe and Ksplice/Oracle on systems software and is known for open-source systems projects such as livegrep and reptyr.

Staff software engineer at Anthropic building systems for AI safety, reliability, and alignment.

Member of technical staff at Anthropic working on AI systems and alignment, with published work on RLHF and constitutional methods for harmless assistants.

Software engineer at Anthropic working on infrastructure, tooling, model behavior, and multimodal systems.

Researcher focused on AI safety, reinforcement learning, and language models, with public work spanning red teaming, adversarial robustness, and model behavior.

Research scientist at Anthropic working on model behavior and interpretability.

Sam McCandlish is listed as an author of the Anthropic technical report Collective Constitutional AI: Aligning a Language Model with Public Input.

Christopher Olah is a co-founder of Anthropic whose public writing focuses on interpretability, neural network circuits, and deep learning visualization. His homepage notes earlier work at OpenAI and Google Brain.

Canonical link

Dario Amodei

Amanda Askell

Jack Clark

Yuntao Bai

Kamal Ndousse

Anna Chen

Nova DasSarma

Dawn Drain

Stanislav Fort

Deep Ganguli

Tom Henighan

Nicholas Joseph

Saurav Kadavath

Jackson Kernion

Tom Conerly

Sheer El-Showk

Nelson Elhage

Zac Hatfield-Dodds

Tristan Hume

Scott Johnston

Shauna Kravec

Tom Brown

Sam McCandlish

Chris Olah