Publications | Tri Dao

2024

FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision

Jay Shah*, Ganesh Bikshandi*, Ying Zhang, Vijay Thakkar, Pradeep Ramani, and Tri Dao

2024

Code
Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality

Tri Dao*, and Albert Gu*

In International Conference on Machine Learning (ICML) , 2024

arXiv Code
FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning

Tri Dao

In International Conference on Learning Representations (ICLR) , 2024

arXiv Code

2023

Mamba: Linear-Time Sequence Modeling with Selective State Spaces

Albert Gu*, and Tri Dao*

Conference on Language Modeling (COLM), 2023

arXiv Code
Hyena Hierarchy: Towards Larger Convolutional Language Models

Michael Poli*, Stefano Massaroli*, Eric Nguyen, Daniel Y Fu, Tri Dao, Stephen Baccus, Yoshua Bengio, Stefano Ermon, and Christopher Ré

In International Conference on Machine Learning (ICML) , 2023
Simple Hardware-Efficient Long Convolutions for Sequence Modeling

Daniel Y. Fu*, Elliot L Epstein*, Eric Nguyen, Armin W Thomas, Michael Zhang, Tri Dao, Atri Rudra, and Christopher Ré

In International Conference on Machine Learning (ICML) , 2023
Hungry Hungry Hippos: Towards Language Modeling with State Space Models

Tri Dao*, Daniel Y. Fu*, Khaled K. Saab, Armin W. Thomas, Atri Rudra, and Christopher Ré

In The International Conference on Learning Representations (ICLR) , 2023

2022

FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness

Tri Dao, Daniel Y. Fu, Stefano Ermon, Atri Rudra, and Christopher Ré

In Advances in Neural Information Processing Systems , 2022

arXiv Code
Monarch: Expressive Structured Matrices for Efficient and Accurate Training

Tri Dao, Beidi Chen, Nimit Sohoni, Arjun Desai, Michael Poli, Jessica Grogan, Alexander Liu, Aniruddh Rao, Atri Rudra, and Christopher Ré

In International Conference on Machine Learning (ICML) , 2022

arXiv Code
Decentralized Training of Foundation Models in Heterogeneous Environments

Binhang Yuan, Yongjun He, Jared Quincy Davis, Tianyi Zhang, Tri Dao, Beidi Chen, Percy Liang, Christopher Ré, and Ce Zhang

In Advances in Neural Information Processing Systems , 2022
Fine-tuning Language Models over Slow Networks using Activation Compression with Guarantees

June Wang, Binhang Yuan, Luka Rimanic, Yongjun He, Tri Dao, Beidi Chen, Percy Liang, Christopher Ré, and Ce Zhang

In Advances in Neural Information Processing Systems , 2022
Transform Once: Efficient Operator Learning in Frequency Domain

Michael Poli, Stefano Massaroli, Federico Berto, Jinkyoo Park, Tri Dao, Christopher Ré, and Stefano Ermon

In Advances in Neural Information Processing Systems , 2022
S4ND: Modeling Images and Videos as Multidimensional Signals with State Spaces

Eric Nguyen, Karan Goel, Albert Gu, Gordon Downs, Preey Shah, Tri Dao, Stephen Baccus, and Christopher Ré

In Advances in Neural Information Processing Systems , 2022
ButterflyFlow: Building Invertible Layers with Butterfly Matrices

Chenlin Meng, Linqi Zhou, Kristy Choi, Tri Dao, and Stefano Ermon

In International Conference on Machine Learning (ICML) , 2022
Pixelated Butterfly: Simple and Efficient Sparse training for Neural Network Models

Tri Dao*, Beidi Chen*, Kaizhao Liang, Jiaming Yang, Zhao Song, Atri Rudra, and Christopher Ré

In International Conference on Learning Representations (ICLR) , 2022

2021

Scatterbrain: Unifying Sparse and Low-rank Attention

Beidi Chen*, Tri Dao*, Eric Winsor, Zhao Song, Atri Rudra, and Christopher Ré

In Advances in Neural Information Processing Systems (NeurIPS) , 2021
Combining Recurrent, Convolutional, and Continuous-time Models with Linear State Space Layers

Albert Gu, Isys Johnson, Karan Goel, Khaled Saab, Tri Dao, Atri Rudra, and Christopher Ré

Advances in Neural Information Processing Systems, 2021
Rethinking Neural Operations for Diverse Tasks

Nicholas Roberts, Mikhail Khodak, Tri Dao, Liam Li, Christopher Ré, and Ameet Talwalkar

In Advances in Neural Information Processing Systems , 2021
Catformer: Designing Stable Transformers via Sensitivity Analysis

Jared Q Davis*, Albert Gu*, Krzysztof Choromanski, Tri Dao, Christopher Ré, Chelsea Finn, and Percy Liang

In International Conference on Machine Learning (ICML) , 2021
Knowledge Distillation as Semiparametric Inference

Tri Dao, Govinda M Kamath, Vasilis Syrgkanis, and Lester Mackey

In International Conference on Learning Representations (ICLR) , 2021
MONGOOSE: A Learnable LSH Framework for Efficient Neural Network Training

Beidi Chen, Zichang Liu, Binghui Peng, Zhaozhuo Xu, Jonathan Lingjie Li, Tri Dao, Zhao Song, Anshumali Shrivastava, and Christopher Ré

In International Conference on Learning Representations (ICLR) , 2021

2020

Kaleidoscope: An Efficient, Learnable Representation For All Structured Linear Maps

Tri Dao, Nimit Sohoni, Albert Gu, Matthew Eichhorn, Amit Blonder, Megan Leszczynski, Atri Rudra, and Christopher Ré

In International Conference on Learning Representations (ICLR) , 2020
HiPPO: Recurrent Memory with Optimal Polynomial Projections

Albert Gu*, Tri Dao*, Stefano Ermon, Atri Rudra, and Christopher Ré

In Advances in neural information processing systems (NeurIPS) , 2020
Kaleidoscope: An Efficient, Learnable Representation For All Structured Linear Maps

Tri Dao, Nimit Sohoni, Albert Gu, Matthew Eichhorn, Amit Blonder, Megan Leszczynski, Atri Rudra, and Christopher Ré

In The International Conference on Learning Representations (ICLR) , 2020

2019

On the downstream performance of compressed word embeddings

Avner May, Jian Zhang, Tri Dao, and Christopher Ré

In Advances in Neural Information Processing Systems (NeurIPS) 32 , 2019
Approximating the Permanent by Sampling from Adaptive Partitions

Jonathan Kuck, Tri Dao, Hamid Rezatofighi, Ashish Sabharwal, and Stefano Ermon

In Advances in Neural Information Processing Systems (NeurIPS) 32 , 2019
Adaptive Hashing for Model Counting

Jonathan Kuck, Tri Dao, Shengjia Zhao, Burak Bartan, Ashish Sabharwal, and Stefano Ermon

In Proceedings of the 35th Conference on Uncertainty in Artificial Intelligence (UAI) , 2019
Learning Fast Algorithms for Linear Transforms Using Butterfly Factorizations

Tri Dao, Albert Gu, Matthew Eichhorn, Atri Rudra, and Christopher Ré

In The International Conference on Machine Learning (ICML) 36 , 2019
A Kernel Theory of Modern Data Augmentation

Tri Dao, Albert Gu, Alexander J Ratner, Virginia Smith, Christopher De Sa, and Christopher Ré

In The International Conference on Machine Learning (ICML) 36 , 2019
Low-Precision Random Fourier Features for Memory-Constrained Kernel Approximation

Jian Zhang, Avner May, Tri Dao, and Christopher Ré

In The International Conference on Artificial Intelligence and Statistics (AISTATS) 22 , 2019

2018

Learning Compressed Transforms with Low Displacement Rank

Anna T Thomas, Albert Gu, Tri Dao, Atri Rudra, and Christopher Ré

In Advances in Neural Information Processing Systems (NeurIPS) 31 , 2018

2017

Gaussian Quadrature for Kernel Features

Tri Dao, Christopher M De Sa, and Christopher Ré

In Advances in Neural Information Processing Systems (NeurIPS) 30 , 2017