Daniel Y. Fu

FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness
FlashFFTConv: Efficient Convolutions for Long Sequences with Tensor Cores
ThunderKittens: Simple, Fast, and Adorable AI Kernels
FlashAttention (GitHub)