LLMpeople
Home People Organizations Reports Fields Schools
Public Atlas People first, reports as evidence, organizations as context.

Atlas / People / Detail

Tom Henighan

Tom Henighan works on large language model interpretability at Anthropic. He previously worked on scaling laws at OpenAI and machine learning engineering at Beehive AI, and he studied physics at Stanford after graduating from Ohio State University in 2010 with a degree in English, mathematics, and philosophy.

Anthropic large language model interpretability researcher1 organizations3 reports

Profile status: updated

Tom Henighan portrait
Suggest a correction
Suggest a source

Trust signals

Profile completeness100%
Public sources5
Official sources2
Last reviewedNot reviewed yet
Official homepage Scholar profile Structured work Structured education
updated 5 public sources
alignmentscaling lawsinterpretability

Current frame

Anthropic large language model interpretability researcher

Education

Ohio State University English, mathematics, and philosophy 2010
Stanford University Ph.D. · Physics

Work

Anthropic Interpretability researcher
OpenAI Role not listed
Beehive AI Role not listed

Public links

website Personal website github GitHub google_scholar Google Scholar

Organizations

core Anthropic

Reports

Alignment and RLHF Constitutional AI: Harmlessness from AI Feedback Alignment and Safety Constitutional Classifiers++: Defending against Universal Jailbreaks across Thousands of Hours of Red Teaming Alignment and RLHF Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback

Official and primary sources

Tom Henighan Official source · homepage · tomhenighan.com Tom Henighan - Google Scholar Official source · scholar · Google Scholar

Supporting sources

henighan Supporting source · github · GitHub Polarized Strange Quark Asymmetries with Tom Henighan Supporting source · other · SLAC National Accelerator Laboratory Constitutional Classifiers++: Defending against Universal Jailbreaks across Thousands of Hours of Red Teaming Supporting source · report · arXiv

LLMpeople is a public atlas for discovering frontier AI researchers with context, provenance, and respect.

Privacy · Terms