LLMpeople
Home People Organizations Reports Fields Schools
Public Atlas People first, reports as evidence, organizations as context.

Atlas / Reports / Detail

Tracing the thoughts of a large language model

Interpretability report from Anthropic with 30 connected researchers in the LLMpeople atlas.

Anthropic2025-03-2730 researchers
Field
Interpretability
Organization
Anthropic
arXiv
N/A

Canonical link

https://www.anthropic.com/research/tracing-thoughts-language-model

Connected researchers

Jared Kaplan portrait
Researcher 2 reports

Jared Kaplan

Anthropic

Chief Science Officer and Co-Founder of Anthropic, with public bios emphasizing scaling laws and large language models.

Anthropic
Deep Ganguli portrait
Researcher 6 reports

Deep Ganguli

Anthropic

Research scientist at Anthropic who leads the Societal Impacts team and works on AI evaluation, alignment, and societal impacts.

Anthropic
United States
Ethan Perez portrait
Researcher 8 reports

Ethan Perez

Anthropic

Research scientist at Anthropic focused on scalable oversight, AI safety, and language model evaluation; previously worked at New York University and Google.

Anthropic
Nicholas Schiefer portrait
Researcher 8 reports

Nicholas Schiefer

Anthropic

Member of Technical Staff at Anthropic and cofounder of Oulipo Labs, working on language model safety, evaluations, and scientific forecasting.

Anthropic
Samuel Marks portrait
Researcher 6 reports

Samuel Marks

Anthropic

Senior research engineer at Anthropic interested in agent foundations, model organisms of misalignment, and human-computer interaction.

Anthropic
David Duvenaud portrait
Researcher 4 reports

David Duvenaud

Anthropic

Associate Professor at the University of Toronto whose research spans deep learning, probabilistic modeling, and machine learning methods for science and AI safety.

Anthropic
Canada
Buck Shlegeris portrait
Researcher 3 reports

Buck Shlegeris

Anthropic

Buck Shlegeris is a Member of Technical Staff at Anthropic whose public homepage focuses on AI safety, model evaluations, and alignment.

Anthropic
Josh Batson portrait
Researcher 2 reports

Josh Batson

Anthropic

Josh Batson is a research scientist at Anthropic. Public descriptions of his work emphasize understanding how and why AI systems work, especially interpretability.

Anthropic
David Bau portrait
Researcher 3 reports

David Bau

Anthropic

Research scientist at Anthropic and assistant professor of computer science at Northeastern University working on interpretability and model understanding.

Anthropic
United States
Alex Tamkin portrait
Researcher 3 reports

Alex Tamkin

Anthropic

Member of technical staff at Anthropic whose work focuses on language models, model understanding, and alignment.

Anthropic
Aengus Lynch portrait
Researcher 2 reports

Aengus Lynch

Anthropic

Aengus Lynch is a fifth-year PhD student in machine learning at Carnegie Mellon University advised by Zico Kolter. His homepage says he works on control, reinforcement learning, games, and machine learning, and his CV shows research internships at Anthropic and Google DeepMind after completing a BASc in engineering physics at the University of British Columbia.

Anthropic
Will McCrostie portrait
Researcher 2 reports

Will McCrostie

Anthropic

Will McCrostie is listed as an author of the Anthropic technical report Tracing the thoughts of a large language model.

Anthropic
Nikhil Prakash portrait
Researcher 2 reports

Nikhil Prakash

Anthropic

Nikhil Prakash is a research scientist at Anthropic. His homepage says he recently completed a PhD in computer science at Berkeley advised by Jacob Steinhardt, after earlier work at Mila and Carnegie Mellon University. He studies learning in deep networks and the principles that lead to emergent phenomena.

Anthropic
Nora Belrose portrait
Researcher 2 reports

Nora Belrose

Anthropic

Nora Belrose is an AI researcher whose work studies neural language models, latent structure, and cognition. She has contributed to Anthropic research on tracing and interpreting reasoning in large language models.

Anthropic
Canal Yuen portrait
Researcher 1 reports

Canal Yuen

Anthropic

Researcher at Anthropic and coauthor of the Tracing the thoughts of a large language model.

Anthropic
Andy Zou portrait
Researcher 1 reports

Andy Zou

Anthropic

Andy Zou is a final-year PhD student in the Language Technologies Institute at Carnegie Mellon University. His homepage says he studies large language models, reasoning, coding, and AI safety, and his CV shows previous internships at Scale AI and Google after BS and MS degrees in electrical engineering and computer sciences from UC Berkeley.

Anthropic
Carl Vondrick portrait
Researcher 1 reports

Carl Vondrick

Anthropic

Carl Vondrick is an Assistant Professor of Computer Science at Columbia University and a researcher at Apple whose work spans computer vision, machine learning, and multimodal systems. He previously worked as a research scientist at Google and a visiting researcher at Cruise, completed his PhD at MIT in 2017 after a BS at UC Irvine, and coauthored Anthropic interpretability work such as Tracing the thoughts of a large language model.

Anthropic
Henk Tillman portrait
Researcher 1 reports

Henk Tillman

Anthropic

Henk Tillman is listed as an author of the Anthropic technical report Tracing the thoughts of a large language model.

Anthropic
Shiva R. Pujari portrait
Researcher 1 reports

Shiva R. Pujari

Anthropic

Shiva R. Pujari is listed as an author of the Anthropic technical report Tracing the thoughts of a large language model.

Anthropic
Kevin J. Liu portrait
Researcher 1 reports

Kevin J. Liu

Anthropic

Kevin J. Liu is listed as an author of the Anthropic technical report Tracing the thoughts of a large language model.

Anthropic
Robert Ritz portrait
Researcher 1 reports

Robert Ritz

Anthropic

Researcher at Anthropic and coauthor of the Tracing the thoughts of a large language model.

Anthropic
Dion Lampris portrait
Researcher 1 reports

Dion Lampris

Anthropic

Researcher at Anthropic and coauthor of the Tracing the thoughts of a large language model.

Anthropic
Brian C. Smith portrait
Researcher 1 reports

Brian C. Smith

Anthropic

Brian C. Smith is listed as an author of the Anthropic technical report Tracing the thoughts of a large language model.

Anthropic
Sharon Qian portrait
Researcher 1 reports

Sharon Qian

Anthropic

Sharon Qian is listed as an author of the Anthropic technical report Tracing the thoughts of a large language model.

Anthropic

LLMpeople is a public atlas for discovering frontier AI researchers with context, provenance, and respect.

Privacy ยท Terms