Tom Conerly

In-context Learning and Induction Heads
Language Models (Mostly) Know What They Know
Scaling Laws and Interpretability of Learning from Repeated Data
Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback
Constitutional AI: Harmlessness from AI Feedback