![]() |
Tri Dao Previously: PhD, Department of Computer Science |
Email: tri [at] tridao (dot) me
To prospective PhD students: Please apply to the Princeton PhD program and mention my name in your research statement.
Machine learning and systems, with a focus on efficient training and long-range context:
Efficient Transformer training and inference.
Sequence models with long-range memory.
Structured sparsity for compact deep learning models.
FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness
Tri Dao, Daniel Y. Fu, Stefano Ermon, Atri Rudra, Christopher Ré.
In Hardware Aware Efficient Training Workshop at ICML, 2022. Best Paper Award
In Sparsity in Neural Networks Workshop, 2022. Oral presentation
In NeurIPS: Proceedings of the 35th Neural Information Processing Systems Conference, December 2022.
[Paper]
[Code]
[IEEE Spectrum and Forbes articles about our submission to the MLPerf 2.0 benchmark using FlashAttention]
Usage: We've been very happy to see FlashAttention being widely adopted in such a short
time after its release. This page
contains a partial list of places where FlashAttention is being used.
Monarch: Expressive Structured Matrices for Efficient and Accurate Training
Tri Dao, Beidi Chen, Nimit Sohoni, Arjun Desai, Michael Poli, Jessica Grogan, Alexander Liu, Aniruddh Rao, Atri Rudra, Christopher Ré.
In ICML: 39th International Conference on Machine Learning, July 2022. Outstanding Paper Runner-up
[Paper]
[Code]
[Poster] Outstanding Poster Award at the EfficientML Bay Area meetup
FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning
Tri Dao, July 2023.
[Paper][Code]
StarCoder: may the source be with you!
The BigCode project.
In Transactions on Machine Learning Research, November 2023.
[Paper]
Hyena Hierarchy: Towards Larger Convolutional Language Models
Michael Poli*, Stefano Massaroli*, Eric Nguyen*, Daniel Y. Fu, Tri Dao, Stephen Baccus, Yoshua Bengio, Stefano Ermon, Christopher Ré.
In ICML: 40th International Conference on Machine Learning, July 2023. Oral
[Paper]
[Code]
Deja Vu: Contextual Sparsity for Efficient LLMs at Inference Time
Zichang Liu, Jue Wang, Tri Dao, Tianyi Zhou, Binhang Yuan, Zhao Song, Anshumali Shrivastava, Ce Zhang, Yuandong Tian, Christopher Ré, Beidi Chen.
In ICML: 40th International Conference on Machine Learning, July 2023. Oral
[Paper]
[Code]
Simple Hardware-Efficient Long Convolutions for Sequence Modeling
Daniel Y. Fu*, Elliot L. Epstein*, Eric Nguyen, Armin W. Thomas, Michael Zhang, Tri Dao, Atri Rudra, Christopher Ré.
In ICML: 40th International Conference on Machine Learning, July 2023.
[Paper]
[Code]
Hungry Hungry Hippos: Towards Language Modeling with State Space Models
Tri Dao*, Daniel Y. Fu*, Khaled K. Saab, Armin W. Thomas, Atri Rudra, Christopher Ré.
In ICLR: 11th International Conference on Learning Representations, May 2023. Spotlight
[Paper]
[Code]
Effectively Modeling Time Series with Simple Discrete State Spaces
Michael Zhang, Khaled Kamal Saab, Michael Poli, Tri Dao, Karan Goel, Christopher Ré
In ICLR: 11th International Conference on Learning Representations, May 2023.
[Paper]
FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness
Tri Dao, Daniel Y. Fu, Stefano Ermon, Atri Rudra, Christopher Ré.
In Hardware Aware Efficient Training Workshop at ICML, 2022. Best Paper Award
In Sparsity in Neural Networks Workshop, 2022. Oral presentation
In NeurIPS: Proceedings of the 35th Neural Information Processing Systems Conference, December 2022.
[Paper]
[Code]
[IEEE Spectrum article about our submission to the MLPerf 2.0 benchmark using FlashAttention]
Usage: We've been very happy to see FlashAttention being widely adopted in such a short
time after its release. This page
contains a partial list of places where FlashAttention is being used.
Decentralized Training of Foundation Models in Heterogeneous Environments
Binhang Yuan, Yongjun He, Jared Quincy Davis, Tianyi Zhang, Tri Dao, Beidi Chen, Percy Liang, Christopher Ré, Ce Zhang.
In NeurIPS: Proceedings of the 35th Neural Information Processing Systems Conference, December 2022. Oral
[Paper]
Fine-tuning Language Models over Slow Networks using Activation Compression with Guarantees
Jue Wang, Binhang Yuan, Luka Rimanic, Yongjun He, Tri Dao, Beidi Chen, Christopher Ré, Ce Zhang.
In NeurIPS: Proceedings of the 35th Neural Information Processing Systems Conference, December 2022.
[Paper]
Transform Once: Efficient Operator Learning in Frequency Domain
Michael Poli, Stefano Massaroli, Federico Berto, Jinkyoo Park, Tri Dao, Christopher Ré, Stefano Ermon.
In NeurIPS: Proceedings of the 35th Neural Information Processing Systems Conference, December 2022.
[Paper]
S4ND: Modeling Images and Videos as Multidimensional Signals with State Spaces
Eric Nguyen, Karan Goel, Albert Gu, Gordon Downs, Preey Shah, Tri Dao, Stephen Baccus, Christopher Ré.
In NeurIPS: Proceedings of the 35th Neural Information Processing Systems Conference, December 2022.
[arXiv]
Monarch: Expressive Structured Matrices for Efficient and Accurate Training
Tri Dao, Beidi Chen, Nimit Sohoni, Arjun Desai, Michael Poli, Jessica Grogan, Alexander Liu, Aniruddh Rao, Atri Rudra, Christopher Ré.
In ICML: 39th International Conference on Machine Learning, July 2022. Outstanding Paper Runner-up
[Paper]
[Code]
[Poster] Outstanding Poster Award at the EfficientML Bay Area meetup
ButterflyFlow: Building Invertible Layers with Butterfly Matrices
Chenlin Meng, Linqi Zhou, Kristy Choi, Tri Dao, Stefano Ermon.
In ICML: 39th International Conference on Machine Learning, July 2022.
[Paper]
Pixelated Butterfly: Simple and Efficient Sparse training for Neural Network Models
Tri Dao*, Beidi Chen*, Kaizhao Liang, Jiaming Yang, Zhao Song, Atri Rudra, Christopher Ré.
In ICLR: 10th International Conference on Learning Representations, May 2022. Spotlight
[arXiv]
[Code]
[Blogpost]
Scatterbrain: Unifying Sparse and Low-rank Attention
Beidi Chen*, Tri Dao*, Eric Winsor, Zhao Song, Atri Rudra, Christopher Ré.
In NeurIPS: Proceedings of the 34th Neural Information Processing Systems Conference, December 2021.
[arXiv]
[Code]
Combining Recurrent, Convolutional, and Continuous-time Models with Structured Learned Linear State-Space Layers
Albert Gu, Isys Johnson, Karan Goel, Khaled Kamal Saab, Tri Dao, Atri Rudra, Christopher Ré.
In NeurIPS: Proceedings of the 34th Neural Information Processing Systems Conference, December 2021.
[arXiv]
[Code]
Rethinking Neural Operations for Diverse Tasks
Nicholas Roberts*, Mikhail Khodak*, Tri Dao, Liam Li, Christopher Ré, Ameet Talwalkar.
In NeurIPS: Proceedings of the 34th Neural Information Processing Systems Conference, December 2021.
[arXiv]
[Code]
[Python package]
Catformer: Designing Stable Transformers via Sensitivity Analysis
Jared Q Davis*, Albert Gu*, Krzysztof Choromanski, Tri Dao, Christopher Ré, Chelsea Finn, Percy Liang.
In ICML: 38th International Conference on Machine Learning, July 2021.
Knowledge Distillation as Semiparametric Inference
Tri Dao, Govinda M Kamath, Vasilis Syrgkanis, Lester Mackey.
In ICLR: 9th International Conference on Learning Representations, May 2021.
[arXiv]
[Code]
MONGOOSE: A Learnable LSH Framework for Efficient Neural Network Training
Beidi Chen, Zichang Liu, Binghui Peng, Zhaozhuo Xu, Jonathan Lingjie Li, Tri Dao, Zhao Song, Anshumali Shrivastava, Christopher Ré.
In ICLR: 9th International Conference on Learning Representations, May 2021. Oral
[Code]
HiPPO: Recurrent Memory with Optimal Polynomial Projections
Albert Gu*, Tri Dao*, Stefano Ermon, Atri Rudra, Christopher Ré.
In NeurIPS: Proceedings of the 33rd Neural Information Processing Systems Conference, December 2020. Spotlight
[arXiv]
[Code]
[Blogpost]
Kaleidoscope: An Efficient, Learnable Representation For All Structured Linear Maps
Tri Dao, Nimit Sohoni, Albert Gu, Matthew Eichhorn, Amit Blonder, Megan Leszczynski, Atri Rudra, Christopher Ré.
In ICLR: 8th International Conference on Learning Representations, April 2020. Spotlight
[arXiv]
[Code]
[Blogpost]
[Video]
[Slides]
On the Downstream Performance of Compressed Word Embeddings
Avner May, Jian Zhang, Tri Dao, Christopher Ré.
In NeurIPS: Proceedings of the 32nd Neural Information Processing Systems Conference, December 2019. Spotlight
[arXiv]
[Code]
Approximating the Permanent by Sampling from Adaptive Partitions
Jonathan Kuck, Tri Dao, Hamid Rezatofighi, Ashish Sabharwal, Stefano Ermon
In NeurIPS: Proceedings of the 32nd Neural Information Processing Systems Conference, December 2019.
[arXiv]
Adaptive Hashing for Model Counting
Jonathan Kuck, Tri Dao, Shenjia Zhao, Burak Bartan, Ashish Sabharwal, Stefano Ermon
In UAI: The Conference on Uncertainty in Artificial Intelligence, July 2019.
[Supplement]
[Code]
Learning Fast Algorithms for Linear Transforms Using Butterfly Factorizations
Tri Dao, Albert Gu, Matthew Eichhorn, Atri Rudra, Christopher Ré
In ICML: The 36th International Conference on Machine Learning, June 2019. Full Oral Presentation
[arXiv]
[Code]
[Blogpost]
[Poster]
A Kernel Theory of Modern Data Augmentation
Tri Dao, Albert Gu, Alexander J. Ratner, Virginia Smith, Christopher De Sa, Christopher Ré
In ICML: The 36th International Conference on Machine Learning, June 2019.
[arXiv]
[Code]
[Poster]
[Slides]
Low-Precision Random Fourier Features for Memory-Constrained Kernel Approximation
Jian Zhang*, Avner May*, Tri Dao, Christopher Ré.
In AISTATS: The 22nd International Conference on Artificial Intelligence and Statistics, April 2019.
[arXiv]
[Code]
Learning Compressed Transforms with Low Displacement Rank
Anna T. Thomas*, Albert Gu*, Tri Dao, Atri Rudra, Christopher Ré
In NeurIPS: Proceedings of the 31st Neural Information Processing Systems Conference, December 2018.
[arXiv]
[Code]
[Poster]
Preliminary version:
Learning Invariance with Compact Transforms
Anna T. Thomas, Albert Gu, Tri Dao, Atri Rudra, Christopher Ré
In ICLR: 6th International Conference on Learning Representations, Workshop track, April 2018.
Gaussian Quadrature for Kernel Features
Tri Dao, Christopher De Sa, Christopher Ré
In NeurIPS: Proceedings of the 30th Neural Information Processing Systems Conference, December 2017. Spotlight
[arXiv]
[Code]
[Video]
[Poster]
[Slides]
Teaching assistant at Stanford University for:
Probabilistic Graphical Models (CS228), Winter 2020
Machine Learning (CS229), Spring 2019
Convex Optimization I (EE364A), Winter 2016
Introduction to Matrix Methods (EE103), Fall 2015