Publications

2024

  1. FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision
    Jay Shah*, Ganesh Bikshandi*, Ying Zhang, Vijay Thakkar, Pradeep Ramani, and Tri Dao
    2024
  2. Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality
    Tri Dao*, and Albert Gu*
    In International Conference on Machine Learning (ICML) , 2024
  3. FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning
    Tri Dao
    In International Conference on Learning Representations (ICLR) , 2024

2023

  1. Mamba: Linear-Time Sequence Modeling with Selective State Spaces
    Albert Gu*, and Tri Dao*
    Conference on Language Modeling (COLM), 2023
  2. Hyena Hierarchy: Towards Larger Convolutional Language Models
    Michael Poli*, Stefano Massaroli*, Eric Nguyen, Daniel Y Fu, Tri Dao, Stephen Baccus, Yoshua Bengio, Stefano Ermon, and Christopher Ré
    In International Conference on Machine Learning (ICML) , 2023
  3. Simple Hardware-Efficient Long Convolutions for Sequence Modeling
    Daniel Y. Fu*, Elliot L Epstein*, Eric Nguyen, Armin W Thomas, Michael Zhang, Tri Dao, Atri Rudra, and Christopher Ré
    In International Conference on Machine Learning (ICML) , 2023
  4. Hungry Hungry Hippos: Towards Language Modeling with State Space Models
    Tri Dao*, Daniel Y. Fu*, Khaled K. Saab, Armin W. Thomas, Atri Rudra, and Christopher Ré
    In The International Conference on Learning Representations (ICLR) , 2023

2022

  1. FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness
    Tri Dao, Daniel Y. Fu, Stefano Ermon, Atri Rudra, and Christopher Ré
    In Advances in Neural Information Processing Systems , 2022
  2. Monarch: Expressive Structured Matrices for Efficient and Accurate Training
    Tri Dao, Beidi Chen, Nimit Sohoni, Arjun Desai, Michael Poli, Jessica Grogan, Alexander Liu, Aniruddh Rao, Atri Rudra, and Christopher Ré
    In International Conference on Machine Learning (ICML) , 2022
  3. Decentralized Training of Foundation Models in Heterogeneous Environments
    Binhang Yuan, Yongjun He, Jared Quincy Davis, Tianyi Zhang, Tri Dao, Beidi Chen, Percy Liang, Christopher Ré, and Ce Zhang
    In Advances in Neural Information Processing Systems , 2022
  4. Fine-tuning Language Models over Slow Networks using Activation Compression with Guarantees
    June Wang, Binhang Yuan, Luka Rimanic, Yongjun He, Tri Dao, Beidi Chen, Percy Liang, Christopher Ré, and Ce Zhang
    In Advances in Neural Information Processing Systems , 2022
  5. Transform Once: Efficient Operator Learning in Frequency Domain
    Michael Poli, Stefano Massaroli, Federico Berto, Jinkyoo Park, Tri Dao, Christopher Ré, and Stefano Ermon
    In Advances in Neural Information Processing Systems , 2022
  6. S4ND: Modeling Images and Videos as Multidimensional Signals with State Spaces
    Eric Nguyen, Karan Goel, Albert Gu, Gordon Downs, Preey Shah, Tri Dao, Stephen Baccus, and Christopher Ré
    In Advances in Neural Information Processing Systems , 2022
  7. ButterflyFlow: Building Invertible Layers with Butterfly Matrices
    Chenlin Meng, Linqi Zhou, Kristy Choi, Tri Dao, and Stefano Ermon
    In International Conference on Machine Learning (ICML) , 2022
  8. Pixelated Butterfly: Simple and Efficient Sparse training for Neural Network Models
    Tri Dao*, Beidi Chen*, Kaizhao Liang, Jiaming Yang, Zhao Song, Atri Rudra, and Christopher Ré
    In International Conference on Learning Representations (ICLR) , 2022

2021

  1. Scatterbrain: Unifying Sparse and Low-rank Attention
    Beidi Chen*, Tri Dao*, Eric Winsor, Zhao Song, Atri Rudra, and Christopher Ré
    In Advances in Neural Information Processing Systems (NeurIPS) , 2021
  2. Combining Recurrent, Convolutional, and Continuous-time Models with Linear State Space Layers
    Albert Gu, Isys Johnson, Karan Goel, Khaled Saab, Tri Dao, Atri Rudra, and Christopher Ré
    Advances in Neural Information Processing Systems, 2021
  3. Rethinking Neural Operations for Diverse Tasks
    Nicholas Roberts, Mikhail Khodak, Tri Dao, Liam Li, Christopher Ré, and Ameet Talwalkar
    In Advances in Neural Information Processing Systems , 2021
  4. Catformer: Designing Stable Transformers via Sensitivity Analysis
    Jared Q Davis*, Albert Gu*, Krzysztof Choromanski, Tri Dao, Christopher Ré, Chelsea Finn, and Percy Liang
    In International Conference on Machine Learning (ICML) , 2021
  5. Knowledge Distillation as Semiparametric Inference
    Tri Dao, Govinda M Kamath, Vasilis Syrgkanis, and Lester Mackey
    In International Conference on Learning Representations (ICLR) , 2021
  6. MONGOOSE: A Learnable LSH Framework for Efficient Neural Network Training
    Beidi Chen, Zichang Liu, Binghui Peng, Zhaozhuo Xu, Jonathan Lingjie Li, Tri Dao, Zhao Song, Anshumali Shrivastava, and Christopher Ré
    In International Conference on Learning Representations (ICLR) , 2021

2020

  1. Kaleidoscope: An Efficient, Learnable Representation For All Structured Linear Maps
    Tri Dao, Nimit Sohoni, Albert Gu, Matthew Eichhorn, Amit Blonder, Megan Leszczynski, Atri Rudra, and Christopher Ré
    In International Conference on Learning Representations (ICLR) , 2020
  2. HiPPO: Recurrent Memory with Optimal Polynomial Projections
    Albert Gu*, Tri Dao*, Stefano Ermon, Atri Rudra, and Christopher Ré
    In Advances in neural information processing systems (NeurIPS) , 2020
  3. Kaleidoscope: An Efficient, Learnable Representation For All Structured Linear Maps
    Tri Dao, Nimit Sohoni, Albert Gu, Matthew Eichhorn, Amit Blonder, Megan Leszczynski, Atri Rudra, and Christopher Ré
    In The International Conference on Learning Representations (ICLR) , 2020

2019

  1. On the downstream performance of compressed word embeddings
    Avner May, Jian Zhang, Tri Dao, and Christopher Ré
    In Advances in Neural Information Processing Systems (NeurIPS) 32 , 2019
  2. Approximating the Permanent by Sampling from Adaptive Partitions
    Jonathan Kuck, Tri Dao, Hamid Rezatofighi, Ashish Sabharwal, and Stefano Ermon
    In Advances in Neural Information Processing Systems (NeurIPS) 32 , 2019
  3. Adaptive Hashing for Model Counting
    Jonathan Kuck, Tri Dao, Shengjia Zhao, Burak Bartan, Ashish Sabharwal, and Stefano Ermon
    In Proceedings of the 35th Conference on Uncertainty in Artificial Intelligence (UAI) , 2019
  4. Learning Fast Algorithms for Linear Transforms Using Butterfly Factorizations
    Tri Dao, Albert Gu, Matthew Eichhorn, Atri Rudra, and Christopher Ré
    In The International Conference on Machine Learning (ICML) 36 , 2019
  5. A Kernel Theory of Modern Data Augmentation
    Tri Dao, Albert Gu, Alexander J Ratner, Virginia Smith, Christopher De Sa, and Christopher Ré
    In The International Conference on Machine Learning (ICML) 36 , 2019
  6. Low-Precision Random Fourier Features for Memory-Constrained Kernel Approximation
    Jian Zhang, Avner May, Tri Dao, and Christopher Ré
    In The International Conference on Artificial Intelligence and Statistics (AISTATS) 22 , 2019

2018

  1. Learning Compressed Transforms with Low Displacement Rank
    Anna T Thomas, Albert Gu, Tri Dao, Atri Rudra, and Christopher Ré
    In Advances in Neural Information Processing Systems (NeurIPS) 31 , 2018

2017

  1. Gaussian Quadrature for Kernel Features
    Tri Dao, Christopher M De Sa, and Christopher Ré
    In Advances in Neural Information Processing Systems (NeurIPS) 30 , 2017