Canonical link
https://www.anthropic.com/research/tracing-thoughts-language-model
Atlas / Reports / Detail
Interpretability report from Anthropic with 30 connected researchers in the LLMpeople atlas.
https://www.anthropic.com/research/tracing-thoughts-language-model
Connected researchers
Anthropic
Chief Science Officer and Co-Founder of Anthropic, with public bios emphasizing scaling laws and large language models.
Anthropic
Research scientist at Anthropic who leads the Societal Impacts team and works on AI evaluation, alignment, and societal impacts.
Anthropic
Research scientist at Anthropic focused on scalable oversight, AI safety, and language model evaluation; previously worked at New York University and Google.
Anthropic
Member of Technical Staff at Anthropic and cofounder of Oulipo Labs, working on language model safety, evaluations, and scientific forecasting.
Anthropic
Senior research engineer at Anthropic interested in agent foundations, model organisms of misalignment, and human-computer interaction.
Anthropic
Associate Professor at the University of Toronto whose research spans deep learning, probabilistic modeling, and machine learning methods for science and AI safety.
Anthropic
Buck Shlegeris is a Member of Technical Staff at Anthropic whose public homepage focuses on AI safety, model evaluations, and alignment.
Anthropic
Josh Batson is a research scientist at Anthropic. Public descriptions of his work emphasize understanding how and why AI systems work, especially interpretability.
Anthropic
Research scientist at Anthropic and assistant professor of computer science at Northeastern University working on interpretability and model understanding.
Anthropic
Member of technical staff at Anthropic whose work focuses on language models, model understanding, and alignment.
Anthropic
Aengus Lynch is a fifth-year PhD student in machine learning at Carnegie Mellon University advised by Zico Kolter. His homepage says he works on control, reinforcement learning, games, and machine learning, and his CV shows research internships at Anthropic and Google DeepMind after completing a BASc in engineering physics at the University of British Columbia.
Anthropic
Will McCrostie is listed as an author of the Anthropic technical report Tracing the thoughts of a large language model.
Anthropic
Nikhil Prakash is a research scientist at Anthropic. His homepage says he recently completed a PhD in computer science at Berkeley advised by Jacob Steinhardt, after earlier work at Mila and Carnegie Mellon University. He studies learning in deep networks and the principles that lead to emergent phenomena.
Anthropic
Nora Belrose is an AI researcher whose work studies neural language models, latent structure, and cognition. She has contributed to Anthropic research on tracing and interpreting reasoning in large language models.
Anthropic
Researcher at Anthropic and coauthor of the Tracing the thoughts of a large language model.
Anthropic
Andy Zou is a final-year PhD student in the Language Technologies Institute at Carnegie Mellon University. His homepage says he studies large language models, reasoning, coding, and AI safety, and his CV shows previous internships at Scale AI and Google after BS and MS degrees in electrical engineering and computer sciences from UC Berkeley.
Anthropic
Carl Vondrick is an Assistant Professor of Computer Science at Columbia University and a researcher at Apple whose work spans computer vision, machine learning, and multimodal systems. He previously worked as a research scientist at Google and a visiting researcher at Cruise, completed his PhD at MIT in 2017 after a BS at UC Irvine, and coauthored Anthropic interpretability work such as Tracing the thoughts of a large language model.
Anthropic
Henk Tillman is listed as an author of the Anthropic technical report Tracing the thoughts of a large language model.
Anthropic
Shiva R. Pujari is listed as an author of the Anthropic technical report Tracing the thoughts of a large language model.
Anthropic
Kevin J. Liu is listed as an author of the Anthropic technical report Tracing the thoughts of a large language model.
Anthropic
Researcher at Anthropic and coauthor of the Tracing the thoughts of a large language model.
Anthropic
Researcher at Anthropic and coauthor of the Tracing the thoughts of a large language model.
Anthropic
Brian C. Smith is listed as an author of the Anthropic technical report Tracing the thoughts of a large language model.
Anthropic
Sharon Qian is listed as an author of the Anthropic technical report Tracing the thoughts of a large language model.