Atlas / People / Detail

Jacob Hilton

Jacob Hilton is a researcher and executive director at Alignment Research Center, where he works on mechanistic approaches to outperforming random sampling. He previously worked at OpenAI on truthfulness, reinforcement learning, and interpretability for language models, earlier worked at Jane Street, completed a PhD in mathematics at the University of Leeds, and later coauthored Anthropic work on constitutional classifiers.

Alignment Research Center executive director and researcher1 organizations2 reports

Profile status: updated

updated 4 public sources

ai safetyreinforcement learninginterpretability

Current frame

Alignment Research Center executive director and researcher