Song Han

Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding
Once-for-All: Train One Network and Specialize it for Efficient Deployment
SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models
Efficient Streaming Language Models with Attention Sinks