LLMpeople
Home People Organizations Reports Fields Schools
Public Atlas People first, reports as evidence, organizations as context.

Atlas / Reports / Detail

Tracing the thoughts of a large language model

Interpretability

AnthropicUndated30 researchers
Field
Interpretability
Organization
Anthropic
arXiv
2503.21435

Canonical link

https://arxiv.org/abs/2503.21435

Connected researchers

Profile Reports

Samuel Marks

Anthropic

Senior research engineer at Anthropic interested in agent foundations, model organisms of misalignment, and human-computer interaction.

Anthropic
Unknown 6
Profile Reports

David Duvenaud

Anthropic

Associate Professor at the University of Toronto whose research spans deep learning, probabilistic modeling, and machine learning methods for science and AI safety.

Anthropic
Canada 4
Profile Reports

Nora Belrose

Anthropic

Nora Belrose is an AI researcher whose work studies neural language models, latent structure, and cognition. She has contributed to Anthropic research on tracing and interpreting reasoning in large language models.

Anthropic
Unknown 2
Profile Reports

David Bau

Anthropic

Research scientist at Anthropic and assistant professor of computer science at Northeastern University working on interpretability and model understanding.

Anthropic
United States 3
Profile Reports

Josh Batson

Anthropic

Member of technical staff at Anthropic interested in understanding deep learning and AI safety; previously a research scientist at OpenAI.

Anthropic
Unknown 2
Profile Reports

Ethan Perez

Anthropic

Research scientist at Anthropic focused on scalable oversight, AI safety, and language model evaluation; previously worked at New York University and Google.

Anthropic
Unknown 8
Profile Reports

Nicholas Schiefer

Anthropic

Member of Technical Staff at Anthropic and cofounder of Oulipo Labs, working on language model safety, evaluations, and scientific forecasting.

Anthropic
Unknown 8
Profile Reports

Deep Ganguli

Anthropic

Co-founder and head of alignment science at Anthropic.

Anthropic
Unknown 6
Profile Reports

Alex Tamkin

Anthropic

Member of technical staff at Anthropic whose work focuses on language models, model understanding, and alignment.

Anthropic
Unknown 3
Profile Reports

Buck Shlegeris

Anthropic

Buck Shlegeris is a Member of Technical Staff at Anthropic whose public homepage focuses on AI safety, model evaluations, and alignment.

Anthropic
Unknown 3
Profile Reports

Jared Kaplan

Anthropic

Jared Kaplan is a researcher at Anthropic known for work on scaling laws and large language models.

Anthropic
Unknown 2
Profile Reports

Alex Turner

Anthropic

Researcher in alignment science at Anthropic focused on AI safety and alignment.

Anthropic
Unknown 1
Profile Reports

Murray Shanahan

Anthropic

Emeritus Professor of Cognitive Robotics at Imperial College London whose public work focuses on artificial intelligence, robotics, and consciousness.

Anthropic
United Kingdom 1
Profile Reports

Pieter Abbeel

Anthropic

Computer scientist and robotics researcher whose public work focuses on reinforcement learning, imitation learning, and large-scale AI systems.

Anthropic
Unknown 1
Profile Reports

Aengus Lynch

Anthropic

Profile still being enriched.

Anthropic
Unknown 2
Profile Reports

Nikhil Prakash

Anthropic

Profile still being enriched.

Anthropic
Unknown 2
Profile Reports

Will McCrostie

Anthropic

Profile still being enriched.

Anthropic
Unknown 2
Profile Reports

Andy Zou

Anthropic

Profile still being enriched.

Anthropic
Unknown 1
Profile Reports

Brian C. Smith

Anthropic

Profile still being enriched.

Anthropic
Unknown 1
Profile Reports

Canal Yuen

Anthropic

Profile still being enriched.

Anthropic
Unknown 1
Profile Reports

Carl Vondrick

Anthropic

Profile still being enriched.

Anthropic
Unknown 1
Profile Reports

David Janz

Anthropic

Profile still being enriched.

Anthropic
Unknown 1
Profile Reports

Dion Lampris

Anthropic

Profile still being enriched.

Anthropic
Unknown 1
Profile Reports

Henk Tillman

Anthropic

Profile still being enriched.

Anthropic
Unknown 1

LLMpeople is a public atlas for discovering frontier AI researchers with context, provenance, and respect.