Publications | Tri Dao

2025

Marconi: Prefix Caching for the Era of Hybrid LLMs

Rui Pan, Zhuang Wang, Zhen Jia, Can Karakus, Luca Zancato, Tri Dao, Ravi Netravali, and Yida Wang

In Machine Learning and Systems (MLSys) , 2025

Outstanding Paper Honorable Mention arXiv Code

Outstanding Paper Honorable Mention
Log-Linear Attention

Han Guo, Songlin Yang, Tarushii Goel, Eric P Xing, Tri Dao, and Yoon Kim

arXiv preprint arXiv:2506.04761, 2025

arXiv
Hardware-Efficient Attention for Fast Decoding

Ted Zadouri, Hubert Strauss, and Tri Dao

In Conference on Language Modeling (COLM) , 2025

arXiv
Long-context state-space video world models

Ryan Po, Yotam Nitzan, Richard Zhang, Berlin Chen, Tri Dao, Eli Shechtman, Gordon Wetzstein, and Xun Huang

In International Conference on Computer Vision (ICCV) , 2025

arXiv
M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models

Junxiong Wang, Wen-Ding Li, Daniele Paliotta, Daniel Ritter, Alexander M Rush, and Tri Dao

arXiv preprint arXiv:2504.10449, 2025
HybriDNA: A Hybrid Transformer-Mamba2 Long-Range DNA Language Model

Mingqian Ma, Guoqing Liu, Chuan Cao, Pan Deng, Tri Dao, Albert Gu, Peiran Jin, Zhao Yang, Yingce Xia, Renqian Luo, and others

arXiv preprint arXiv:2502.10807, 2025
Thinking slow, fast: Scaling inference compute with distilled reasoners

Daniele Paliotta, Junxiong Wang, Matteo Pagliardini, Kevin Y Li, Aviv Bick, J Zico Kolter, Albert Gu, François Fleuret, and Tri Dao

arXiv preprint arXiv:2502.20339, 2025
Ladder-residual: parallelism-aware architecture for accelerating large model inference with communication overlapping

Muru Zhang, Mayank Mishra, Zhongzhu Zhou, William Brandon, Jue Wang, Yoon Kim, Jonathan Ragan-Kelley, Shuaiwen Leon Song, Ben Athiwaratkun, and Tri Dao

In International Conference on Machine Learning (ICML) , 2025

2024

FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision

Jay Shah*, Ganesh Bikshandi*, Ying Zhang, Vijay Thakkar, Pradeep Ramani, and Tri Dao

In Advances in Neural Information Processing Systems (NeurIPS) , 2024

arXiv Code
Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality

Tri Dao*, and Albert Gu*

In International Conference on Machine Learning (ICML) , 2024

arXiv Code
Redpajama: an open dataset for training large language models

Maurice Weber, Daniel Fu, Quentin Anthony, Yonatan Oren , Shane Adams, Anton Alexandrov, Xiaozhong Lyu, Huu Nguyen, Xiaozhe Yao , Virgini Adams, Ben Athiwaratkun, Rahul Chalamala, Kezhen Chen, Max Ryabinin, Tri Dao, Percy Liang, Christopher Ré, Irina Rish, and Ce Zhang

In Advances in Neural Information Processing Systems (NeurIPS) , 2024
The mamba in the llama: Distilling and accelerating hybrid models

Junxiong Wang, Daniele Paliotta, Avner May, Alexander M Rush, and Tri Dao

In Advances in Neural Information Processing Systems (NeurIPS) , 2024
Hydra: Bidirectional state space models through generalized matrix mixers

Sukjun Hwang, Aakash Lahoti, Tri Dao, and Albert Gu

In Advances in Neural Information Processing Systems (NeurIPS) , 2024
Bitdelta: Your fine-tune may only be worth one bit

James Liu, Guangxuan Xiao, Kai Li, Jason D Lee, Song Han, Tri Dao, and Tianle Cai

In Advances in Neural Information Processing Systems (NeurIPS) , 2024
An Empirical Study of Mamba-based Language Models

Roger Waleffe, Wonmin Byeon, Duncan Riach, Brandon Norick, Vijay Korthikanti, Tri Dao, Albert Gu, Ali Hatamizadeh, Sudhakar Singh, Deepak Narayanan, and others

arXiv preprint arXiv:2406.07887, 2024
Starcoder 2 and the stack v2: The next generation

Anton Lozhkov, Raymond Li, Loubna Ben Allal, Federico Cassano, Joel Lamy-Poirier, Nouamane Tazi, Ao Tang, Dmytro Pykhtar, Jiawei Liu, Yuxiang Wei, and others

arXiv preprint arXiv:2402.19173, 2024
Caduceus: Bi-directional equivariant long-range DNA sequence modeling

Yair Schiff, Chia-Hsiang Kao, Aaron Gokaslan, Tri Dao, Albert Gu, and Volodymyr Kuleshov

In International Conference on Machine Learning (ICML) , 2024
Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality

Tri Dao*, and Albert Gu*

In International Conference on Machine Learning (ICML) , 2024
Medusa: Simple LLM inference acceleration framework with multiple decoding heads

Tianle Cai, Yuhong Li, Zhengyang Geng, Hongwu Peng, Jason D Lee, Deming Chen, and Tri Dao

In International Conference on Machine Learning (ICML) , 2024
FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning

Tri Dao

In International Conference on Learning Representations (ICLR) , 2024

arXiv Code

2023

Mamba: Linear-Time Sequence Modeling with Selective State Spaces

Albert Gu*, and Tri Dao*

Conference on Language Modeling (COLM), 2023

Outstanding Paper arXiv Code

Outstanding Paper
Deja vu: Contextual sparsity for efficient llms at inference time

Zichang Liu, Jue Wang, Tri Dao, Tianyi Zhou, Binhang Yuan, Zhao Song, Anshumali Shrivastava, Ce Zhang, Yuandong Tian, Christopher Ré, and Beidi Chen

In International Conference on Machine Learning , 2023

Oral

Oral
Effectively modeling time series with simple discrete state spaces

Michael Zhang, Khaled K Saab, Michael Poli, Tri Dao, Karan Goel, and Christopher Ré

In International Conference on Learning Representations (ICLR) , 2023
Hyena Hierarchy: Towards Larger Convolutional Language Models

Michael Poli*, Stefano Massaroli*, Eric Nguyen, Daniel Y Fu, Tri Dao, Stephen Baccus, Yoshua Bengio, Stefano Ermon, and Christopher Ré

In International Conference on Machine Learning (ICML) , 2023

Oral

Oral
Simple Hardware-Efficient Long Convolutions for Sequence Modeling

Daniel Y. Fu*, Elliot L Epstein*, Eric Nguyen, Armin W Thomas, Michael Zhang, Tri Dao, Atri Rudra, and Christopher Ré

In International Conference on Machine Learning (ICML) , 2023
Hungry Hungry Hippos: Towards Language Modeling with State Space Models

Tri Dao*, Daniel Y. Fu*, Khaled K. Saab, Armin W. Thomas, Atri Rudra, and Christopher Ré

In The International Conference on Learning Representations (ICLR) , 2023

Spotlight

Spotlight

2022

FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness

Tri Dao, Daniel Y. Fu, Stefano Ermon, Atri Rudra, and Christopher Ré

In Advances in Neural Information Processing Systems , 2022

Awards arXiv Code

Best Paper award at the ICML Hardware Aware Efficient Training Workshop 2022, Inaugural Stanford Open Source Software Prize 2024
Monarch: Expressive Structured Matrices for Efficient and Accurate Training

Tri Dao, Beidi Chen, Nimit Sohoni, Arjun Desai, Michael Poli, Jessica Grogan, Alexander Liu, Aniruddh Rao, Atri Rudra, and Christopher Ré

In International Conference on Machine Learning (ICML) , 2022

Outstanding Paper runner-up arXiv Code

Outstanding Paper runner-up
Decentralized Training of Foundation Models in Heterogeneous Environments

Binhang Yuan, Yongjun He, Jared Quincy Davis, Tianyi Zhang, Tri Dao, Beidi Chen, Percy Liang, Christopher Ré, and Ce Zhang

In Advances in Neural Information Processing Systems , 2022

Oral

Oral
Fine-tuning Language Models over Slow Networks using Activation Compression with Guarantees

June Wang, Binhang Yuan, Luka Rimanic, Yongjun He, Tri Dao, Beidi Chen, Percy Liang, Christopher Ré, and Ce Zhang

In Advances in Neural Information Processing Systems , 2022
Transform Once: Efficient Operator Learning in Frequency Domain

Michael Poli, Stefano Massaroli, Federico Berto, Jinkyoo Park, Tri Dao, Christopher Ré, and Stefano Ermon

In Advances in Neural Information Processing Systems , 2022
S4ND: Modeling Images and Videos as Multidimensional Signals with State Spaces

Eric Nguyen, Karan Goel, Albert Gu, Gordon Downs, Preey Shah, Tri Dao, Stephen Baccus, and Christopher Ré

In Advances in Neural Information Processing Systems , 2022
ButterflyFlow: Building Invertible Layers with Butterfly Matrices

Chenlin Meng, Linqi Zhou, Kristy Choi, Tri Dao, and Stefano Ermon

In International Conference on Machine Learning (ICML) , 2022
Pixelated Butterfly: Simple and Efficient Sparse training for Neural Network Models

Tri Dao*, Beidi Chen*, Kaizhao Liang, Jiaming Yang, Zhao Song, Atri Rudra, and Christopher Ré

In International Conference on Learning Representations (ICLR) , 2022

Spotlight

Spotlight

2021

Scatterbrain: Unifying Sparse and Low-rank Attention

Beidi Chen*, Tri Dao*, Eric Winsor, Zhao Song, Atri Rudra, and Christopher Ré

In Advances in Neural Information Processing Systems (NeurIPS) , 2021
Combining Recurrent, Convolutional, and Continuous-time Models with Linear State Space Layers

Albert Gu, Isys Johnson, Karan Goel, Khaled Saab, Tri Dao, Atri Rudra, and Christopher Ré

Advances in Neural Information Processing Systems, 2021
Rethinking Neural Operations for Diverse Tasks

Nicholas Roberts, Mikhail Khodak, Tri Dao, Liam Li, Christopher Ré, and Ameet Talwalkar

In Advances in Neural Information Processing Systems , 2021
Catformer: Designing Stable Transformers via Sensitivity Analysis

Jared Q Davis*, Albert Gu*, Krzysztof Choromanski, Tri Dao, Christopher Ré, Chelsea Finn, and Percy Liang

In International Conference on Machine Learning (ICML) , 2021
Knowledge Distillation as Semiparametric Inference

Tri Dao, Govinda M Kamath, Vasilis Syrgkanis, and Lester Mackey

In International Conference on Learning Representations (ICLR) , 2021
MONGOOSE: A Learnable LSH Framework for Efficient Neural Network Training

Beidi Chen, Zichang Liu, Binghui Peng, Zhaozhuo Xu, Jonathan Lingjie Li, Tri Dao, Zhao Song, Anshumali Shrivastava, and Christopher Ré

In International Conference on Learning Representations (ICLR) , 2021

Oral

Oral

2020

HiPPO: Recurrent Memory with Optimal Polynomial Projections

Albert Gu*, Tri Dao*, Stefano Ermon, Atri Rudra, and Christopher Ré

In Advances in neural information processing systems (NeurIPS) , 2020

Spotlight

Spotlight
Kaleidoscope: An Efficient, Learnable Representation For All Structured Linear Maps

Tri Dao, Nimit Sohoni, Albert Gu, Matthew Eichhorn, Amit Blonder, Megan Leszczynski, Atri Rudra, and Christopher Ré

In The International Conference on Learning Representations (ICLR) , 2020

Spotlight

Spotlight

2019

On the downstream performance of compressed word embeddings

Avner May, Jian Zhang, Tri Dao, and Christopher Ré

In Advances in Neural Information Processing Systems (NeurIPS) 32 , 2019

Spotlight

Spotlight
Approximating the Permanent by Sampling from Adaptive Partitions

Jonathan Kuck, Tri Dao, Hamid Rezatofighi, Ashish Sabharwal, and Stefano Ermon

In Advances in Neural Information Processing Systems (NeurIPS) 32 , 2019
Adaptive Hashing for Model Counting

Jonathan Kuck, Tri Dao, Shengjia Zhao, Burak Bartan, Ashish Sabharwal, and Stefano Ermon

In Proceedings of the 35th Conference on Uncertainty in Artificial Intelligence (UAI) , 2019
Learning Fast Algorithms for Linear Transforms Using Butterfly Factorizations

Tri Dao, Albert Gu, Matthew Eichhorn, Atri Rudra, and Christopher Ré

In The International Conference on Machine Learning (ICML) 36 , 2019

Oral

Oral
A Kernel Theory of Modern Data Augmentation

Tri Dao, Albert Gu, Alexander J Ratner, Virginia Smith, Christopher De Sa, and Christopher Ré

In The International Conference on Machine Learning (ICML) 36 , 2019
Low-Precision Random Fourier Features for Memory-Constrained Kernel Approximation

Jian Zhang, Avner May, Tri Dao, and Christopher Ré

In The International Conference on Artificial Intelligence and Statistics (AISTATS) 22 , 2019

2018

Learning Compressed Transforms with Low Displacement Rank

Anna T Thomas, Albert Gu, Tri Dao, Atri Rudra, and Christopher Ré

In Advances in Neural Information Processing Systems (NeurIPS) 31 , 2018

2017

Gaussian Quadrature for Kernel Features

Tri Dao, Christopher M De Sa, and Christopher Ré

In Advances in Neural Information Processing Systems (NeurIPS) 30 , 2017

Spotlight

Spotlight