Publications

2025

  1. Marconi: Prefix Caching for the Era of Hybrid LLMs
    Rui Pan, Zhuang Wang, Zhen Jia, Can Karakus, Luca Zancato, Tri Dao, Ravi Netravali, and Yida Wang
    In Machine Learning and Systems (MLSys) , 2025
  2. M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models
    Junxiong Wang, Wen-Ding Li, Daniele Paliotta, Daniel Ritter, Alexander M Rush, and Tri Dao
    arXiv preprint arXiv:2504.10449, 2025
  3. HybriDNA: A Hybrid Transformer-Mamba2 Long-Range DNA Language Model
    Mingqian Ma, Guoqing Liu, Chuan Cao, Pan Deng, Tri Dao, Albert Gu, Peiran Jin, Zhao Yang, Yingce Xia, Renqian Luo, and  others
    arXiv preprint arXiv:2502.10807, 2025
  4. Thinking slow, fast: Scaling inference compute with distilled reasoners
    Daniele Paliotta, Junxiong Wang, Matteo Pagliardini, Kevin Y Li, Aviv Bick, J Zico Kolter, Albert Gu, François Fleuret, and Tri Dao
    arXiv preprint arXiv:2502.20339, 2025
  5. Ladder-residual: parallelism-aware architecture for accelerating large model inference with communication overlapping
    Muru Zhang, Mayank Mishra, Zhongzhu Zhou, William Brandon, Jue Wang, Yoon Kim, Jonathan Ragan-Kelley, Shuaiwen Leon Song, Ben Athiwaratkun, and Tri Dao
    In International Conference on Machine Learning (ICML) , 2025

2024

  1. FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision
    Jay Shah*, Ganesh Bikshandi*, Ying Zhang, Vijay Thakkar, Pradeep Ramani, and Tri Dao
    In Advances in Neural Information Processing Systems (NeurIPS) , 2024
  2. Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality
    Tri Dao*, and Albert Gu*
    In International Conference on Machine Learning (ICML) , 2024
  3. Redpajama: an open dataset for training large language models
    Maurice Weber, Daniel Fu, Quentin Anthony, Yonatan Oren , Shane Adams, Anton Alexandrov, Xiaozhong Lyu, Huu Nguyen, Xiaozhe Yao , Virgini Adams, Ben Athiwaratkun, Rahul Chalamala, Kezhen Chen, Max Ryabinin, Tri Dao, Percy Liang, Christopher Ré, Irina Rish, and Ce Zhang
    In Advances in Neural Information Processing Systems (NeurIPS) , 2024
  4. The mamba in the llama: Distilling and accelerating hybrid models
    Junxiong Wang, Daniele Paliotta, Avner May, Alexander M Rush, and Tri Dao
    In Advances in Neural Information Processing Systems (NeurIPS) , 2024
  5. Hydra: Bidirectional state space models through generalized matrix mixers
    Sukjun Hwang, Aakash Lahoti, Tri Dao, and Albert Gu
    In Advances in Neural Information Processing Systems (NeurIPS) , 2024
  6. Bitdelta: Your fine-tune may only be worth one bit
    James Liu, Guangxuan Xiao, Kai Li, Jason D Lee, Song Han, Tri Dao, and Tianle Cai
    In Advances in Neural Information Processing Systems (NeurIPS) , 2024
  7. An Empirical Study of Mamba-based Language Models
    Roger Waleffe, Wonmin Byeon, Duncan Riach, Brandon Norick, Vijay Korthikanti, Tri Dao, Albert Gu, Ali Hatamizadeh, Sudhakar Singh, Deepak Narayanan, and  others
    arXiv preprint arXiv:2406.07887, 2024
  8. Starcoder 2 and the stack v2: The next generation
    Anton Lozhkov, Raymond Li, Loubna Ben Allal, Federico Cassano, Joel Lamy-Poirier, Nouamane Tazi, Ao Tang, Dmytro Pykhtar, Jiawei Liu, Yuxiang Wei, and  others
    arXiv preprint arXiv:2402.19173, 2024
  9. Caduceus: Bi-directional equivariant long-range DNA sequence modeling
    Yair Schiff, Chia-Hsiang Kao, Aaron Gokaslan, Tri Dao, Albert Gu, and Volodymyr Kuleshov
    In International Conference on Machine Learning (ICML) , 2024
  10. Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality
    Tri Dao*, and Albert Gu*
    In International Conference on Machine Learning (ICML) , 2024
  11. Medusa: Simple LLM inference acceleration framework with multiple decoding heads
    Tianle Cai, Yuhong Li, Zhengyang Geng, Hongwu Peng, Jason D Lee, Deming Chen, and Tri Dao
    In International Conference on Machine Learning (ICML) , 2024
  12. FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning
    Tri Dao
    In International Conference on Learning Representations (ICLR) , 2024

2023

  1. Mamba: Linear-Time Sequence Modeling with Selective State Spaces
    Albert Gu*, and Tri Dao*
    Conference on Language Modeling (COLM), 2023
  2. Deja vu: Contextual sparsity for efficient llms at inference time
    Zichang Liu, Jue Wang, Tri Dao, Tianyi Zhou, Binhang Yuan, Zhao Song, Anshumali Shrivastava, Ce Zhang, Yuandong Tian, Christopher Ré, and Beidi Chen
    In International Conference on Machine Learning , 2023
  3. Effectively modeling time series with simple discrete state spaces
    Michael Zhang, Khaled K Saab, Michael Poli, Tri Dao, Karan Goel, and Christopher Ré
    In International Conference on Learning Representations (ICLR) , 2023
  4. Hyena Hierarchy: Towards Larger Convolutional Language Models
    Michael Poli*, Stefano Massaroli*, Eric Nguyen, Daniel Y Fu, Tri Dao, Stephen Baccus, Yoshua Bengio, Stefano Ermon, and Christopher Ré
    In International Conference on Machine Learning (ICML) , 2023
  5. Simple Hardware-Efficient Long Convolutions for Sequence Modeling
    Daniel Y. Fu*, Elliot L Epstein*, Eric Nguyen, Armin W Thomas, Michael Zhang, Tri Dao, Atri Rudra, and Christopher Ré
    In International Conference on Machine Learning (ICML) , 2023
  6. Hungry Hungry Hippos: Towards Language Modeling with State Space Models
    Tri Dao*, Daniel Y. Fu*, Khaled K. Saab, Armin W. Thomas, Atri Rudra, and Christopher Ré
    In The International Conference on Learning Representations (ICLR) , 2023

2022

  1. FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness
    Tri Dao, Daniel Y. Fu, Stefano Ermon, Atri Rudra, and Christopher Ré
    In Advances in Neural Information Processing Systems , 2022
  2. Monarch: Expressive Structured Matrices for Efficient and Accurate Training
    Tri Dao, Beidi Chen, Nimit Sohoni, Arjun Desai, Michael Poli, Jessica Grogan, Alexander Liu, Aniruddh Rao, Atri Rudra, and Christopher Ré
    In International Conference on Machine Learning (ICML) , 2022
  3. Decentralized Training of Foundation Models in Heterogeneous Environments
    Binhang Yuan, Yongjun He, Jared Quincy Davis, Tianyi Zhang, Tri Dao, Beidi Chen, Percy Liang, Christopher Ré, and Ce Zhang
    In Advances in Neural Information Processing Systems , 2022
  4. Fine-tuning Language Models over Slow Networks using Activation Compression with Guarantees
    June Wang, Binhang Yuan, Luka Rimanic, Yongjun He, Tri Dao, Beidi Chen, Percy Liang, Christopher Ré, and Ce Zhang
    In Advances in Neural Information Processing Systems , 2022
  5. Transform Once: Efficient Operator Learning in Frequency Domain
    Michael Poli, Stefano Massaroli, Federico Berto, Jinkyoo Park, Tri Dao, Christopher Ré, and Stefano Ermon
    In Advances in Neural Information Processing Systems , 2022
  6. S4ND: Modeling Images and Videos as Multidimensional Signals with State Spaces
    Eric Nguyen, Karan Goel, Albert Gu, Gordon Downs, Preey Shah, Tri Dao, Stephen Baccus, and Christopher Ré
    In Advances in Neural Information Processing Systems , 2022
  7. ButterflyFlow: Building Invertible Layers with Butterfly Matrices
    Chenlin Meng, Linqi Zhou, Kristy Choi, Tri Dao, and Stefano Ermon
    In International Conference on Machine Learning (ICML) , 2022
  8. Pixelated Butterfly: Simple and Efficient Sparse training for Neural Network Models
    Tri Dao*, Beidi Chen*, Kaizhao Liang, Jiaming Yang, Zhao Song, Atri Rudra, and Christopher Ré
    In International Conference on Learning Representations (ICLR) , 2022

2021

  1. Scatterbrain: Unifying Sparse and Low-rank Attention
    Beidi Chen*, Tri Dao*, Eric Winsor, Zhao Song, Atri Rudra, and Christopher Ré
    In Advances in Neural Information Processing Systems (NeurIPS) , 2021
  2. Combining Recurrent, Convolutional, and Continuous-time Models with Linear State Space Layers
    Albert Gu, Isys Johnson, Karan Goel, Khaled Saab, Tri Dao, Atri Rudra, and Christopher Ré
    Advances in Neural Information Processing Systems, 2021
  3. Rethinking Neural Operations for Diverse Tasks
    Nicholas Roberts, Mikhail Khodak, Tri Dao, Liam Li, Christopher Ré, and Ameet Talwalkar
    In Advances in Neural Information Processing Systems , 2021
  4. Catformer: Designing Stable Transformers via Sensitivity Analysis
    Jared Q Davis*, Albert Gu*, Krzysztof Choromanski, Tri Dao, Christopher Ré, Chelsea Finn, and Percy Liang
    In International Conference on Machine Learning (ICML) , 2021
  5. Knowledge Distillation as Semiparametric Inference
    Tri Dao, Govinda M Kamath, Vasilis Syrgkanis, and Lester Mackey
    In International Conference on Learning Representations (ICLR) , 2021
  6. MONGOOSE: A Learnable LSH Framework for Efficient Neural Network Training
    Beidi Chen, Zichang Liu, Binghui Peng, Zhaozhuo Xu, Jonathan Lingjie Li, Tri Dao, Zhao Song, Anshumali Shrivastava, and Christopher Ré
    In International Conference on Learning Representations (ICLR) , 2021

2020

  1. HiPPO: Recurrent Memory with Optimal Polynomial Projections
    Albert Gu*, Tri Dao*, Stefano Ermon, Atri Rudra, and Christopher Ré
    In Advances in neural information processing systems (NeurIPS) , 2020
  2. Kaleidoscope: An Efficient, Learnable Representation For All Structured Linear Maps
    Tri Dao, Nimit Sohoni, Albert Gu, Matthew Eichhorn, Amit Blonder, Megan Leszczynski, Atri Rudra, and Christopher Ré
    In The International Conference on Learning Representations (ICLR) , 2020

2019

  1. On the downstream performance of compressed word embeddings
    Avner May, Jian Zhang, Tri Dao, and Christopher Ré
    In Advances in Neural Information Processing Systems (NeurIPS) 32 , 2019
  2. Approximating the Permanent by Sampling from Adaptive Partitions
    Jonathan Kuck, Tri Dao, Hamid Rezatofighi, Ashish Sabharwal, and Stefano Ermon
    In Advances in Neural Information Processing Systems (NeurIPS) 32 , 2019
  3. Adaptive Hashing for Model Counting
    Jonathan Kuck, Tri Dao, Shengjia Zhao, Burak Bartan, Ashish Sabharwal, and Stefano Ermon
    In Proceedings of the 35th Conference on Uncertainty in Artificial Intelligence (UAI) , 2019
  4. Learning Fast Algorithms for Linear Transforms Using Butterfly Factorizations
    Tri Dao, Albert Gu, Matthew Eichhorn, Atri Rudra, and Christopher Ré
    In The International Conference on Machine Learning (ICML) 36 , 2019
  5. A Kernel Theory of Modern Data Augmentation
    Tri Dao, Albert Gu, Alexander J Ratner, Virginia Smith, Christopher De Sa, and Christopher Ré
    In The International Conference on Machine Learning (ICML) 36 , 2019
  6. Low-Precision Random Fourier Features for Memory-Constrained Kernel Approximation
    Jian Zhang, Avner May, Tri Dao, and Christopher Ré
    In The International Conference on Artificial Intelligence and Statistics (AISTATS) 22 , 2019

2018

  1. Learning Compressed Transforms with Low Displacement Rank
    Anna T Thomas, Albert Gu, Tri Dao, Atri Rudra, and Christopher Ré
    In Advances in Neural Information Processing Systems (NeurIPS) 31 , 2018

2017

  1. Gaussian Quadrature for Kernel Features
    Tri Dao, Christopher M De Sa, and Christopher Ré
    In Advances in Neural Information Processing Systems (NeurIPS) 30 , 2017