Atlas / Fields / Detail
Interpretability
Researchers connected to this field in the public atlas.
Deep Ganguli
Anthropic
Research scientist at Anthropic who leads the Societal Impacts team and works on AI evaluation, alignment, and societal impacts.
Aengus Lynch
Anthropic
Aengus Lynch is a fifth-year PhD student in machine learning at Carnegie Mellon University advised by Zico Kolter. His homepage says he works on control, reinforcement learning, games, and machine learning, and his CV shows research internships at Anthropic and Google DeepMind after completing a BASc in engineering physics at the University of British Columbia.
David Krueger
Anthropic
David Krueger is an assistant professor in robust, reasoning, and responsible AI at the University of Montreal and a core academic member at Mila. His homepage says he trained in deep learning under Yoshua Bengio, Roland Memisevic, and Aaron Courville from 2013 to 2021, was at the University of Cambridge from 2021 to 2024, and founded the nonprofit Evitable in 2025.
Jacob Steinhardt
Anthropic
Jacob Steinhardt is an Associate Professor of Statistics and EECS at UC Berkeley and the founder and CEO of Transluce, a nonprofit research lab building technology to understand frontier AI systems. He previously spent a postdoctoral year at OpenAI and Open Philanthropy after completing his PhD at Stanford University with Percy Liang, and he coauthored Anthropic interpretability work such as Tracing the thoughts of a large language model.
Andy Zou
Anthropic
Andy Zou is a final-year PhD student in the Language Technologies Institute at Carnegie Mellon University. His homepage says he studies large language models, reasoning, coding, and AI safety, and his CV shows previous internships at Scale AI and Google after BS and MS degrees in electrical engineering and computer sciences from UC Berkeley.
Carl Vondrick
Anthropic
Carl Vondrick is an Assistant Professor of Computer Science at Columbia University and a researcher at Apple whose work spans computer vision, machine learning, and multimodal systems. He previously worked as a research scientist at Google and a visiting researcher at Cruise, completed his PhD at MIT in 2017 after a BS at UC Irvine, and coauthored Anthropic interpretability work such as Tracing the thoughts of a large language model.
Max Tegmark
Anthropic
Max Tegmark is a physicist and professor at MIT whose work spans cosmology, fundamental physics, and the implications of advanced AI systems. After earning his PhD in physics from the University of California, Berkeley in 1994 and undergraduate training at the Royal Institute of Technology in Stockholm, he worked as a postdoctoral researcher at the University of Pennsylvania before joining MIT and later coauthored Anthropic interpretability work on large language models.
Samuel Marks
Anthropic
Senior research engineer at Anthropic interested in agent foundations, model organisms of misalignment, and human-computer interaction.
Nikhil Prakash
Anthropic
Nikhil Prakash is a research scientist at Anthropic. His homepage says he recently completed a PhD in computer science at Berkeley advised by Jacob Steinhardt, after earlier work at Mila and Carnegie Mellon University. He studies learning in deep networks and the principles that lead to emergent phenomena.
David Duvenaud
Anthropic
Associate Professor at the University of Toronto whose research spans deep learning, probabilistic modeling, and machine learning methods for science and AI safety.
Ethan Perez
Anthropic
Research scientist at Anthropic focused on scalable oversight, AI safety, and language model evaluation; previously worked at New York University and Google.
Nicholas Schiefer
Anthropic
Member of Technical Staff at Anthropic and cofounder of Oulipo Labs, working on language model safety, evaluations, and scientific forecasting.
Nora Belrose
Anthropic
Nora Belrose is an AI researcher whose work studies neural language models, latent structure, and cognition. She has contributed to Anthropic research on tracing and interpreting reasoning in large language models.
David Bau
Anthropic
Research scientist at Anthropic and assistant professor of computer science at Northeastern University working on interpretability and model understanding.
Josh Batson
Anthropic
Josh Batson is a research scientist at Anthropic. Public descriptions of his work emphasize understanding how and why AI systems work, especially interpretability.
Jared Kaplan
Anthropic
Chief Science Officer and Co-Founder of Anthropic, with public bios emphasizing scaling laws and large language models.
Alex Tamkin
Anthropic
Member of technical staff at Anthropic whose work focuses on language models, model understanding, and alignment.
Buck Shlegeris
Anthropic
Buck Shlegeris is a Member of Technical Staff at Anthropic whose public homepage focuses on AI safety, model evaluations, and alignment.
Will McCrostie
Anthropic
Will McCrostie is listed as an author of the Anthropic technical report Tracing the thoughts of a large language model.
Canal Yuen
Anthropic
Researcher at Anthropic and coauthor of the Tracing the thoughts of a large language model.
David Janz
Anthropic
Researcher at Anthropic and coauthor of the Tracing the thoughts of a large language model.
Dion Lampris
Anthropic
Researcher at Anthropic and coauthor of the Tracing the thoughts of a large language model.
Robert Ritz
Anthropic
Researcher at Anthropic and coauthor of the Tracing the thoughts of a large language model.
Benjamin Crouzier
Anthropic
Benjamin Crouzier is listed as an author of the Anthropic technical report On the Biology of a Large Language Model.
Brian C. Smith
Anthropic
Brian C. Smith is listed as an author of the Anthropic technical report Tracing the thoughts of a large language model.
Can Rager
Anthropic
Can Rager is listed as an author of the Anthropic technical report On the Biology of a Large Language Model.
Eric J. Michaud
Anthropic
Eric J. Michaud is listed as an author of the Anthropic technical report On the Biology of a Large Language Model.
Henk Tillman
Anthropic
Henk Tillman is listed as an author of the Anthropic technical report Tracing the thoughts of a large language model.
Kevin J. Liu
Anthropic
Kevin J. Liu is listed as an author of the Anthropic technical report Tracing the thoughts of a large language model.
Sharon Qian
Anthropic
Sharon Qian is listed as an author of the Anthropic technical report Tracing the thoughts of a large language model.
Shiva R. Pujari
Anthropic
Shiva R. Pujari is listed as an author of the Anthropic technical report Tracing the thoughts of a large language model.
Alex Turner
Anthropic
Researcher in alignment science at Anthropic focused on AI safety and alignment.
Murray Shanahan
Anthropic
Emeritus Professor of Cognitive Robotics at Imperial College London whose public work focuses on artificial intelligence, robotics, and consciousness.
Pieter Abbeel
Anthropic
Computer scientist and robotics researcher whose public work focuses on reinforcement learning, imitation learning, and large-scale AI systems.
Stephen Casper
Anthropic
Alignment science researcher at Anthropic whose work focuses on black-box evaluations, white-box evaluations, and AI risk.
Yonatan Belinkov
Anthropic
Associate Professor in the Technion Faculty of Data and Decision Sciences and a visiting research professor at Google working on natural language processing and machine learning.