LLMpeople
Home People Organizations Reports Fields Schools
Public Atlas People first, reports as evidence, organizations as context.

Atlas / People / Detail

Jacob Hilton

Jacob Hilton is a researcher and executive director at Alignment Research Center, where he works on mechanistic approaches to outperforming random sampling. He previously worked at OpenAI on truthfulness, reinforcement learning, and interpretability for language models, earlier worked at Jane Street, completed a PhD in mathematics at the University of Leeds, and later coauthored Anthropic work on constitutional classifiers.

Alignment Research Center executive director and researcher1 organizations2 reports

Profile status: updated

Jacob Hilton portrait
Suggest a correction
Suggest a source

Trust signals

Profile completeness100%
Public sources4
Official sources2
Last reviewedNot reviewed yet
Official homepage Scholar profile Structured work Structured education
updated 4 public sources
ai safetyreinforcement learninginterpretability

Current frame

Alignment Research Center executive director and researcher

Education

University of Leeds Ph.D. · Mathematics 2016

Work

Alignment Research Center Executive Director and Researcher
OpenAI Role not listed
Jane Street Role not listed

Public links

website Personal website github GitHub openreview OpenReview

Organizations

core Anthropic

Reports

Alignment and Safety Constitutional Classifiers: Defending against Universal Jailbreaks across Thousands of Hours of Red Teaming Alignment and Safety Constitutional Classifiers++: Defending against Universal Jailbreaks across Thousands of Hours of Red Teaming

Official and primary sources

Jacob Hilton's Homepage Official source · homepage · jacobh.co.uk Jacob Hilton - OpenReview Official source · openreview · OpenReview

Supporting sources

jacobhilton Supporting source · github · GitHub Combinatorics of countable ordinal topologies Supporting source · other · White Rose eTheses Online

LLMpeople is a public atlas for discovering frontier AI researchers with context, provenance, and respect.

Privacy · Terms