Publications

2026

  1. FlashAttention-4: Algorithm and Kernel Pipelining Co-design for Asymmetric Hardware Scaling
    Ted Zadouri*, Jay Shah*, Markus Hohnerbach*, and 3 more authors
    In Machine Learning and Systems (MLSys), 2026
  2. Mamba-3: Improved Sequence Modeling using State Space Principles
    Aakash Lahoti, Kevin Li, Berlin Chen, and 5 more authors
    In International Conference on Learning Representations (ICLR), 2026
  3. SonicMoE: Accelerating MoE with IO and Tile-aware Optimizations
    Wentao Guo, Mayank Mishra, Xinle Cheng, and 2 more authors
    In International Conference on Learning Representations (ICLR), 2026
  4. Beat the long tail: Distribution-Aware Speculative Decoding for RL Training
    Zelei Shao, Vikranth Srivatsa, Sanjana Srivastava, and 8 more authors
    In Machine Learning and Systems (MLSys), 2026
  5. Speculative Speculative Decoding
    Tanishq Kumar, Tri Dao, and Avner May
    In International Conference on Learning Representations (ICLR), 2026
  6. Log-Linear Attention
    Han Guo, Songlin Yang, Tarushii Goel, and 3 more authors
    In International Conference on Learning Representations (ICLR), 2026
  7. M^2RNN: Non-Linear RNNs with Matrix-Valued States for Scalable Language Modeling
    Mayank Mishra, Shawn Tan, Ion Stoica, and 2 more authors
    arXiv preprint arXiv:2603.14360, 2026
  8. AI+HW 2035: Shaping the Next Decade
    Deming Chen, Jason Cong, Azalia Mirhoseini, and 27 more authors
    arXiv preprint arXiv:2603.05225, 2026
  9. When RL Meets Adaptive Speculative Training: A Unified Training-Serving System
    Junxiong Wang, Fengxiang Bie, Jisen Li, and 15 more authors
    arXiv preprint arXiv:2602.06932, 2026

2025

  1. Marconi: Prefix Caching for the Era of Hybrid LLMs
    Rui Pan, Zhuang Wang, Zhen Jia, and 5 more authors
    In Machine Learning and Systems (MLSys), 2025
  2. Hardware-Efficient Attention for Fast Decoding
    Ted Zadouri, Hubert Strauss, and Tri Dao
    In Conference on Language Modeling (COLM), 2025
  3. Long-context state-space video world models
    Ryan Po, Yotam Nitzan, Richard Zhang, and 5 more authors
    In International Conference on Computer Vision (ICCV), 2025
  4. M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models
    Junxiong Wang, Wen-Ding Li, Daniele Paliotta, and 3 more authors
    arXiv preprint arXiv:2504.10449, 2025
  5. HybriDNA: A Hybrid Transformer-Mamba2 Long-Range DNA Language Model
    Mingqian Ma, Guoqing Liu, Chuan Cao, and 8 more authors
    arXiv preprint arXiv:2502.10807, 2025
  6. Thinking slow, fast: Scaling inference compute with distilled reasoners
    Daniele Paliotta, Junxiong Wang, Matteo Pagliardini, and 6 more authors
    arXiv preprint arXiv:2502.20339, 2025
  7. Ladder-residual: parallelism-aware architecture for accelerating large model inference with communication overlapping
    Muru Zhang, Mayank Mishra, Zhongzhu Zhou, and 7 more authors
    In International Conference on Machine Learning (ICML), 2025
  8. Opportunistic Expert Activation: Batch-Aware Expert Routing for Faster Decode Without Retraining
    Costin-Andrei Oncescu, Qingyang Wu, Wai Tong Chung, and 5 more authors
    arXiv preprint arXiv:2511.02237, 2025

2024

  1. FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision
    Jay Shah*, Ganesh Bikshandi*, Ying Zhang, and 3 more authors
    In Advances in Neural Information Processing Systems (NeurIPS), 2024
  2. Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality
    Tri Dao* and Albert Gu*
    In International Conference on Machine Learning (ICML), 2024
  3. Redpajama: an open dataset for training large language models
    Maurice Weber, Daniel Fu, Quentin Anthony, and 16 more authors
    In Advances in Neural Information Processing Systems (NeurIPS), 2024
  4. The mamba in the llama: Distilling and accelerating hybrid models
    Junxiong Wang, Daniele Paliotta, Avner May, and 2 more authors
    In Advances in Neural Information Processing Systems (NeurIPS), 2024
  5. Hydra: Bidirectional state space models through generalized matrix mixers
    Sukjun Hwang, Aakash Lahoti, Tri Dao, and 1 more author
    In Advances in Neural Information Processing Systems (NeurIPS), 2024
  6. Bitdelta: Your fine-tune may only be worth one bit
    James Liu, Guangxuan Xiao, Kai Li, and 4 more authors
    In Advances in Neural Information Processing Systems (NeurIPS), 2024
  7. An Empirical Study of Mamba-based Language Models
    Roger Waleffe, Wonmin Byeon, Duncan Riach, and 8 more authors
    arXiv preprint arXiv:2406.07887, 2024
  8. Starcoder 2 and the stack v2: The next generation
    Anton Lozhkov, Raymond Li, Loubna Ben Allal, and 8 more authors
    arXiv preprint arXiv:2402.19173, 2024
  9. Caduceus: Bi-directional equivariant long-range DNA sequence modeling
    Yair Schiff, Chia-Hsiang Kao, Aaron Gokaslan, and 3 more authors
    In International Conference on Machine Learning (ICML), 2024
  10. Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality
    Tri Dao* and Albert Gu*
    In International Conference on Machine Learning (ICML), 2024
  11. Medusa: Simple LLM inference acceleration framework with multiple decoding heads
    Tianle Cai, Yuhong Li, Zhengyang Geng, and 4 more authors
    In International Conference on Machine Learning (ICML), 2024
  12. FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning
    Tri Dao
    In International Conference on Learning Representations (ICLR), 2024

2023

  1. Mamba: Linear-Time Sequence Modeling with Selective State Spaces
    Albert Gu* and Tri Dao*
    Conference on Language Modeling (COLM), 2023
  2. Deja vu: Contextual sparsity for efficient llms at inference time
    Zichang Liu, Jue Wang, Tri Dao, and 8 more authors
    In International Conference on Machine Learning, 2023
  3. Effectively modeling time series with simple discrete state spaces
    Michael Zhang, Khaled K Saab, Michael Poli, and 3 more authors
    In International Conference on Learning Representations (ICLR), 2023
  4. Hyena Hierarchy: Towards Larger Convolutional Language Models
    Michael Poli*, Stefano Massaroli*, Eric Nguyen, and 6 more authors
    In International Conference on Machine Learning (ICML), 2023
  5. Simple Hardware-Efficient Long Convolutions for Sequence Modeling
    Daniel Y. Fu*, Elliot L Epstein*, Eric Nguyen, and 5 more authors
    In International Conference on Machine Learning (ICML), 2023
  6. Hungry Hungry Hippos: Towards Language Modeling with State Space Models
    Tri Dao*, Daniel Y. Fu*, Khaled K. Saab, and 3 more authors
    In The International Conference on Learning Representations (ICLR), 2023
  7. StarCoder: may the source be with you!
    Raymond Li, Loubna Ben Allal, Yangtian Zi, and 64 more authors
    Transactions on Machine Learning Research (TMLR), 2023
  8. Flash-Decoding for long-context inference
    Tri Dao, Daniel Haziza, Francisco Massa, and 1 more author
    PyTorch Blog, 2023

2022

  1. FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness
    Tri Dao, Daniel Y. Fu, Stefano Ermon, and 2 more authors
    In Advances in Neural Information Processing Systems, 2022
  2. Monarch: Expressive Structured Matrices for Efficient and Accurate Training
    Tri Dao, Beidi Chen, Nimit Sohoni, and 7 more authors
    In International Conference on Machine Learning (ICML), 2022
  3. Decentralized Training of Foundation Models in Heterogeneous Environments
    Binhang Yuan, Yongjun He, Jared Quincy Davis, and 6 more authors
    In Advances in Neural Information Processing Systems, 2022
  4. Fine-tuning Language Models over Slow Networks using Activation Compression with Guarantees
    June Wang, Binhang Yuan, Luka Rimanic, and 6 more authors
    In Advances in Neural Information Processing Systems, 2022
  5. Transform Once: Efficient Operator Learning in Frequency Domain
    Michael Poli, Stefano Massaroli, Federico Berto, and 4 more authors
    In Advances in Neural Information Processing Systems, 2022
  6. S4ND: Modeling Images and Videos as Multidimensional Signals with State Spaces
    Eric Nguyen, Karan Goel, Albert Gu, and 5 more authors
    In Advances in Neural Information Processing Systems, 2022
  7. ButterflyFlow: Building Invertible Layers with Butterfly Matrices
    Chenlin Meng, Linqi Zhou, Kristy Choi, and 2 more authors
    In International Conference on Machine Learning (ICML), 2022
  8. Pixelated Butterfly: Simple and Efficient Sparse training for Neural Network Models
    Tri Dao*, Beidi Chen*, Kaizhao Liang, and 4 more authors
    In International Conference on Learning Representations (ICLR), 2022

2021

  1. Scatterbrain: Unifying Sparse and Low-rank Attention
    Beidi Chen*, Tri Dao*, Eric Winsor, and 3 more authors
    In Advances in Neural Information Processing Systems (NeurIPS), 2021
  2. Combining Recurrent, Convolutional, and Continuous-time Models with Linear State Space Layers
    Albert Gu, Isys Johnson, Karan Goel, and 4 more authors
    Advances in Neural Information Processing Systems, 2021
  3. Rethinking Neural Operations for Diverse Tasks
    Nicholas Roberts, Mikhail Khodak, Tri Dao, and 3 more authors
    In Advances in Neural Information Processing Systems, 2021
  4. Catformer: Designing Stable Transformers via Sensitivity Analysis
    Jared Q Davis*, Albert Gu*, Krzysztof Choromanski, and 4 more authors
    In International Conference on Machine Learning (ICML), 2021
  5. Knowledge Distillation as Semiparametric Inference
    Tri Dao, Govinda M Kamath, Vasilis Syrgkanis, and 1 more author
    In International Conference on Learning Representations (ICLR), 2021
  6. MONGOOSE: A Learnable LSH Framework for Efficient Neural Network Training
    Beidi Chen, Zichang Liu, Binghui Peng, and 6 more authors
    In International Conference on Learning Representations (ICLR), 2021

2020

  1. HiPPO: Recurrent Memory with Optimal Polynomial Projections
    Albert Gu*, Tri Dao*, Stefano Ermon, and 2 more authors
    In Advances in neural information processing systems (NeurIPS), 2020
  2. Kaleidoscope: An Efficient, Learnable Representation For All Structured Linear Maps
    Tri Dao, Nimit Sohoni, Albert Gu, and 5 more authors
    In The International Conference on Learning Representations (ICLR), 2020

2019

  1. On the downstream performance of compressed word embeddings
    Avner May, Jian Zhang, Tri Dao, and 1 more author
    In Advances in Neural Information Processing Systems (NeurIPS) 32, 2019
  2. Approximating the Permanent by Sampling from Adaptive Partitions
    Jonathan Kuck, Tri Dao, Hamid Rezatofighi, and 2 more authors
    In Advances in Neural Information Processing Systems (NeurIPS) 32, 2019
  3. Adaptive Hashing for Model Counting
    Jonathan Kuck, Tri Dao, Shengjia Zhao, and 3 more authors
    In Proceedings of the 35th Conference on Uncertainty in Artificial Intelligence (UAI), 2019
  4. Learning Fast Algorithms for Linear Transforms Using Butterfly Factorizations
    Tri Dao, Albert Gu, Matthew Eichhorn, and 2 more authors
    In The International Conference on Machine Learning (ICML) 36, 2019
  5. A Kernel Theory of Modern Data Augmentation
    Tri Dao, Albert Gu, Alexander J Ratner, and 3 more authors
    In The International Conference on Machine Learning (ICML) 36, 2019
  6. Low-Precision Random Fourier Features for Memory-Constrained Kernel Approximation
    Jian Zhang, Avner May, Tri Dao, and 1 more author
    In The International Conference on Artificial Intelligence and Statistics (AISTATS) 22, 2019

2018

  1. Learning Compressed Transforms with Low Displacement Rank
    Anna T Thomas, Albert Gu, Tri Dao, and 2 more authors
    In Advances in Neural Information Processing Systems (NeurIPS) 31, 2018

2017

  1. Gaussian Quadrature for Kernel Features
    Tri Dao, Christopher M De Sa, and Christopher Ré
    In Advances in Neural Information Processing Systems (NeurIPS) 30, 2017