Publications

2024

  1. Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality
    Tri Dao*, and Albert Gu*
    In International Conference on Machine Learning (ICML) , 2024
  2. FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning
    Tri Dao
    In International Conference on Learning Representations (ICLR) , 2024

2023

  1. Mamba: Linear-Time Sequence Modeling with Selective State Spaces
    Albert Gu*, and Tri Dao*
    arXiv preprint arXiv:2312.00752, 2023
  2. Hungry Hungry Hippos: Towards Language Modeling with State Space Models
    Tri Dao*, Daniel Y. Fu*, Khaled K. Saab, Armin W. Thomas, Atri Rudra, and Christopher Ré
    In The International Conference on Learning Representations (ICLR) , 2023

2022

  1. FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness
    Tri Dao, Daniel Y. Fu, Stefano Ermon, Atri Rudra, and Christopher Ré
    In Advances in Neural Information Processing Systems , 2022
  2. Monarch: Expressive Structured Matrices for Efficient and Accurate Training
    Tri Dao, Beidi Chen, Nimit Sohoni, Arjun Desai, Michael Poli, Jessica Grogan, Alexander Liu, Aniruddh Rao, Atri Rudra, and Christopher Ré
    In International Conference on Machine Learning (ICML) , 2022
  3. Pixelated Butterfly: Simple and Efficient Sparse training for Neural Network Models
    Tri Dao*, Beidi Chen*, Kaizhao Liang, Jiaming Yang, Zhao Song, Atri Rudra, and Christopher Ré
    In International Conference on Learning Representations (ICLR) , 2022
  4. Efficiently Modeling Long Sequences with Structured State Spaces
    Albert Gu, Karan Goel, and Christopher Ré
    In The International Conference on Learning Representations (ICLR) , 2022
  5. S4ND: Modeling Images and Videos as Multidimensional Signals with State Spaces
    Eric Nguyen, Karan Goel, Albert Gu, Gordon Downs, Preey Shah, Tri Dao, Stephen Baccus, and Christopher Ré
    In Advances in Neural Information Processing Systems , 2022

2021

  1. Scatterbrain: Unifying Sparse and Low-rank Attention
    Beidi Chen*, Tri Dao*, Eric Winsor, Zhao Song, Atri Rudra, and Christopher Ré
    In Advances in Neural Information Processing Systems (NeurIPS) , 2021
  2. Combining Recurrent, Convolutional, and Continuous-time Models with Linear State Space Layers
    Albert Gu, Isys Johnson, Karan Goel, Khaled Saab, Tri Dao, Atri Rudra, and Christopher Ré
    Advances in Neural Information Processing Systems, 2021
  3. Rethinking Neural Operations for Diverse Tasks
    Nicholas Roberts, Mikhail Khodak, Tri Dao, Liam Li, Christopher Ré, and Ameet Talwalkar
    In Advances in Neural Information Processing Systems , 2021

2020

  1. HiPPO: Recurrent Memory with Optimal Polynomial Projections
    Albert Gu*, Tri Dao*, Stefano Ermon, Atri Rudra, and Christopher Ré
    In Advances in neural information processing systems (NeurIPS) , 2020
  2. Kaleidoscope: An Efficient, Learnable Representation For All Structured Linear Maps
    Tri Dao, Nimit Sohoni, Albert Gu, Matthew Eichhorn, Amit Blonder, Megan Leszczynski, Atri Rudra, and Christopher Ré
    In International Conference on Learning Representations (ICLR) , 2020

2019

  1. Learning Fast Algorithms for Linear Transforms Using Butterfly Factorizations
    Tri Dao, Albert Gu, Matthew Eichhorn, Atri Rudra, and Christopher Ré
    In International Conference on Machine Learning (ICML) , 2019