Atlas / Reports / Detail
On the Biology of a Large Language Model
Interpretability
Connected researchers
Samuel Marks
Anthropic
Senior research engineer at Anthropic interested in agent foundations, model organisms of misalignment, and human-computer interaction.
David Duvenaud
Anthropic
Associate Professor at the University of Toronto whose research spans deep learning, probabilistic modeling, and machine learning methods for science and AI safety.
Nora Belrose
Anthropic
Nora Belrose is an AI researcher whose work studies neural language models, latent structure, and cognition. She has contributed to Anthropic research on tracing and interpreting reasoning in large language models.
David Bau
Anthropic
Research scientist at Anthropic and assistant professor of computer science at Northeastern University working on interpretability and model understanding.
Ethan Perez
Anthropic
Research scientist at Anthropic focused on scalable oversight, AI safety, and language model evaluation; previously worked at New York University and Google.
Stephen Casper
Anthropic
Alignment science researcher at Anthropic whose work focuses on black-box evaluations, white-box evaluations, and AI risk.
Yonatan Belinkov
Anthropic
Associate Professor in the Technion Faculty of Data and Decision Sciences and a visiting research professor at Google working on natural language processing and machine learning.
Nikhil Prakash
Anthropic
Profile still being enriched.
Benjamin Crouzier
Anthropic
Profile still being enriched.
Can Rager
Anthropic
Profile still being enriched.
David Krueger
Anthropic
Profile still being enriched.
Eric J. Michaud
Anthropic
Profile still being enriched.
Max Tegmark
Anthropic
Profile still being enriched.