Tracing the thoughts of a large language model

Chief Science Officer and Co-Founder of Anthropic, with public bios emphasizing scaling laws and large language models.

Research scientist at Anthropic who leads the Societal Impacts team and works on AI evaluation, alignment, and societal impacts.

Research scientist at Anthropic focused on scalable oversight, AI safety, and language model evaluation; previously worked at New York University and Google.

Member of Technical Staff at Anthropic and cofounder of Oulipo Labs, working on language model safety, evaluations, and scientific forecasting.

Senior research engineer at Anthropic interested in agent foundations, model organisms of misalignment, and human-computer interaction.

Associate Professor at the University of Toronto whose research spans deep learning, probabilistic modeling, and machine learning methods for science and AI safety.

Buck Shlegeris is a Member of Technical Staff at Anthropic whose public homepage focuses on AI safety, model evaluations, and alignment.

Josh Batson is a research scientist at Anthropic. Public descriptions of his work emphasize understanding how and why AI systems work, especially interpretability.

Research scientist at Anthropic and assistant professor of computer science at Northeastern University working on interpretability and model understanding.

Member of technical staff at Anthropic whose work focuses on language models, model understanding, and alignment.

Aengus Lynch is a fifth-year PhD student in machine learning at Carnegie Mellon University advised by Zico Kolter. His homepage says he works on control, reinforcement learning, games, and machine learning, and his CV shows research internships at Anthropic and Google DeepMind after completing a BASc in engineering physics at the University of British Columbia.

Will McCrostie is listed as an author of the Anthropic technical report Tracing the thoughts of a large language model.

Nikhil Prakash is a research scientist at Anthropic. His homepage says he recently completed a PhD in computer science at Berkeley advised by Jacob Steinhardt, after earlier work at Mila and Carnegie Mellon University. He studies learning in deep networks and the principles that lead to emergent phenomena.

Nora Belrose is an AI researcher whose work studies neural language models, latent structure, and cognition. She has contributed to Anthropic research on tracing and interpreting reasoning in large language models.

Researcher at Anthropic and coauthor of the Tracing the thoughts of a large language model.

Andy Zou is a final-year PhD student in the Language Technologies Institute at Carnegie Mellon University. His homepage says he studies large language models, reasoning, coding, and AI safety, and his CV shows previous internships at Scale AI and Google after BS and MS degrees in electrical engineering and computer sciences from UC Berkeley.

Carl Vondrick is an Assistant Professor of Computer Science at Columbia University and a researcher at Apple whose work spans computer vision, machine learning, and multimodal systems. He previously worked as a research scientist at Google and a visiting researcher at Cruise, completed his PhD at MIT in 2017 after a BS at UC Irvine, and coauthored Anthropic interpretability work such as Tracing the thoughts of a large language model.

Henk Tillman is listed as an author of the Anthropic technical report Tracing the thoughts of a large language model.

Shiva R. Pujari is listed as an author of the Anthropic technical report Tracing the thoughts of a large language model.

Kevin J. Liu is listed as an author of the Anthropic technical report Tracing the thoughts of a large language model.

Researcher at Anthropic and coauthor of the Tracing the thoughts of a large language model.

Brian C. Smith is listed as an author of the Anthropic technical report Tracing the thoughts of a large language model.

Sharon Qian is listed as an author of the Anthropic technical report Tracing the thoughts of a large language model.

Canonical link

Jared Kaplan

Deep Ganguli

Ethan Perez

Nicholas Schiefer

Samuel Marks

David Duvenaud

Buck Shlegeris

Josh Batson

David Bau

Alex Tamkin

Aengus Lynch

Will McCrostie

Nikhil Prakash

Nora Belrose

Canal Yuen

Andy Zou

Carl Vondrick

Henk Tillman

Shiva R. Pujari

Kevin J. Liu

Robert Ritz

Dion Lampris

Brian C. Smith

Sharon Qian