updated 4 public sources
alignmentneural networksmechanistic interpretability

Current frame

Independent mechanistic interpretability researcher and former Anthropic model diffing engineer