Timothy Telleen-Lawton

Measuring Faithfulness in Chain-of-Thought Reasoning
Specific versus General Principles for Constitutional AI
Discovering Language Model Behaviors with Model-Written Evaluations
Constitutional AI: Harmlessness from AI Feedback
Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback