Canonical link
https://www.anthropic.com/research/tracing-thoughts-language-model
Atlas / Reports / Detail
Interpretability report from Anthropic with 13 connected researchers in the LLMpeople atlas.
https://www.anthropic.com/research/tracing-thoughts-language-model
Connected researchers
Anthropic
Research scientist at Anthropic focused on scalable oversight, AI safety, and language model evaluation; previously worked at New York University and Google.
Anthropic
Senior research engineer at Anthropic interested in agent foundations, model organisms of misalignment, and human-computer interaction.
Anthropic
Associate Professor at the University of Toronto whose research spans deep learning, probabilistic modeling, and machine learning methods for science and AI safety.
Anthropic
Research scientist at Anthropic and assistant professor of computer science at Northeastern University working on interpretability and model understanding.
Anthropic
Can Rager is listed as an author of the Anthropic technical report On the Biology of a Large Language Model.
Anthropic
Eric J. Michaud is listed as an author of the Anthropic technical report On the Biology of a Large Language Model.
Anthropic
Associate Professor in the Technion Faculty of Data and Decision Sciences and a visiting research professor at Google working on natural language processing and machine learning.
Anthropic
Nikhil Prakash is a research scientist at Anthropic. His homepage says he recently completed a PhD in computer science at Berkeley advised by Jacob Steinhardt, after earlier work at Mila and Carnegie Mellon University. He studies learning in deep networks and the principles that lead to emergent phenomena.
Anthropic
Alignment science researcher at Anthropic whose work focuses on black-box evaluations, white-box evaluations, and AI risk.
Anthropic
Nora Belrose is an AI researcher whose work studies neural language models, latent structure, and cognition. She has contributed to Anthropic research on tracing and interpreting reasoning in large language models.
Anthropic
David Krueger is an assistant professor in robust, reasoning, and responsible AI at the University of Montreal and a core academic member at Mila. His homepage says he trained in deep learning under Yoshua Bengio, Roland Memisevic, and Aaron Courville from 2013 to 2021, was at the University of Cambridge from 2021 to 2024, and founded the nonprofit Evitable in 2025.
Anthropic
Benjamin Crouzier is listed as an author of the Anthropic technical report On the Biology of a Large Language Model.
Anthropic
Max Tegmark is a physicist and professor at MIT whose work spans cosmology, fundamental physics, and the implications of advanced AI systems. After earning his PhD in physics from the University of California, Berkeley in 1994 and undergraduate training at the Royal Institute of Technology in Stockholm, he worked as a postdoctoral researcher at the University of Pennsylvania before joining MIT and later coauthored Anthropic interpretability work on large language models.