Publications
2025
-  Marconi: Prefix Caching for the Era of Hybrid LLMsIn Machine Learning and Systems (MLSys) , 2025
 -  
 -  Hardware-Efficient Attention for Fast DecodingIn Conference on Language Modeling (COLM) , 2025
 -  Long-context state-space video world modelsIn International Conference on Computer Vision (ICCV) , 2025
 -  M1: Towards Scalable Test-Time Compute with Mamba Reasoning ModelsarXiv preprint arXiv:2504.10449, 2025
 -  HybriDNA: A Hybrid Transformer-Mamba2 Long-Range DNA Language ModelarXiv preprint arXiv:2502.10807, 2025
 -  Thinking slow, fast: Scaling inference compute with distilled reasonersarXiv preprint arXiv:2502.20339, 2025
 -  Ladder-residual: parallelism-aware architecture for accelerating large model inference with communication overlappingIn International Conference on Machine Learning (ICML) , 2025
 
2024
-  Redpajama: an open dataset for training large language modelsIn Advances in Neural Information Processing Systems (NeurIPS) , 2024
 -  The mamba in the llama: Distilling and accelerating hybrid modelsIn Advances in Neural Information Processing Systems (NeurIPS) , 2024
 -  Hydra: Bidirectional state space models through generalized matrix mixersIn Advances in Neural Information Processing Systems (NeurIPS) , 2024
 -  Bitdelta: Your fine-tune may only be worth one bitIn Advances in Neural Information Processing Systems (NeurIPS) , 2024
 -  An Empirical Study of Mamba-based Language ModelsarXiv preprint arXiv:2406.07887, 2024
 -  Starcoder 2 and the stack v2: The next generationarXiv preprint arXiv:2402.19173, 2024
 -  Caduceus: Bi-directional equivariant long-range DNA sequence modelingIn International Conference on Machine Learning (ICML) , 2024
 -  Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space DualityIn International Conference on Machine Learning (ICML) , 2024
 -  Medusa: Simple LLM inference acceleration framework with multiple decoding headsIn International Conference on Machine Learning (ICML) , 2024
 
2023
-  Mamba: Linear-Time Sequence Modeling with Selective State SpacesConference on Language Modeling (COLM), 2023
 -  Deja vu: Contextual sparsity for efficient llms at inference timeIn International Conference on Machine Learning , 2023
 -  Effectively modeling time series with simple discrete state spacesIn International Conference on Learning Representations (ICLR) , 2023
 -  Hyena Hierarchy: Towards Larger Convolutional Language ModelsIn International Conference on Machine Learning (ICML) , 2023
 -  Simple Hardware-Efficient Long Convolutions for Sequence ModelingIn International Conference on Machine Learning (ICML) , 2023
 -  Hungry Hungry Hippos: Towards Language Modeling with State Space ModelsIn The International Conference on Learning Representations (ICLR) , 2023
 
2022
-  Monarch: Expressive Structured Matrices for Efficient and Accurate TrainingIn International Conference on Machine Learning (ICML) , 2022
 -  Decentralized Training of Foundation Models in Heterogeneous EnvironmentsIn Advances in Neural Information Processing Systems , 2022
 -  Fine-tuning Language Models over Slow Networks using Activation Compression with GuaranteesIn Advances in Neural Information Processing Systems , 2022
 -  Transform Once: Efficient Operator Learning in Frequency DomainIn Advances in Neural Information Processing Systems , 2022
 -  S4ND: Modeling Images and Videos as Multidimensional Signals with State SpacesIn Advances in Neural Information Processing Systems , 2022
 -  ButterflyFlow: Building Invertible Layers with Butterfly MatricesIn International Conference on Machine Learning (ICML) , 2022
 -  Pixelated Butterfly: Simple and Efficient Sparse training for Neural Network ModelsIn International Conference on Learning Representations (ICLR) , 2022
 
2021
-  Scatterbrain: Unifying Sparse and Low-rank AttentionIn Advances in Neural Information Processing Systems (NeurIPS) , 2021
 -  Combining Recurrent, Convolutional, and Continuous-time Models with Linear State Space LayersAdvances in Neural Information Processing Systems, 2021
 -  Rethinking Neural Operations for Diverse TasksIn Advances in Neural Information Processing Systems , 2021
 -  Catformer: Designing Stable Transformers via Sensitivity AnalysisIn International Conference on Machine Learning (ICML) , 2021
 -  Knowledge Distillation as Semiparametric InferenceIn International Conference on Learning Representations (ICLR) , 2021
 -  MONGOOSE: A Learnable LSH Framework for Efficient Neural Network TrainingIn International Conference on Learning Representations (ICLR) , 2021
 
2020
-  HiPPO: Recurrent Memory with Optimal Polynomial ProjectionsIn Advances in neural information processing systems (NeurIPS) , 2020
 -  Kaleidoscope: An Efficient, Learnable Representation For All Structured Linear MapsIn The International Conference on Learning Representations (ICLR) , 2020
 
2019
-  On the downstream performance of compressed word embeddingsIn Advances in Neural Information Processing Systems (NeurIPS) 32 , 2019
 -  Approximating the Permanent by Sampling from Adaptive PartitionsIn Advances in Neural Information Processing Systems (NeurIPS) 32 , 2019
 -  Adaptive Hashing for Model CountingIn Proceedings of the 35th Conference on Uncertainty in Artificial Intelligence (UAI) , 2019
 -  Learning Fast Algorithms for Linear Transforms Using Butterfly FactorizationsIn The International Conference on Machine Learning (ICML) 36 , 2019
 -  A Kernel Theory of Modern Data AugmentationIn The International Conference on Machine Learning (ICML) 36 , 2019
 -  Low-Precision Random Fourier Features for Memory-Constrained Kernel ApproximationIn The International Conference on Artificial Intelligence and Statistics (AISTATS) 22 , 2019
 
2018
-  Learning Compressed Transforms with Low Displacement RankIn Advances in Neural Information Processing Systems (NeurIPS) 31 , 2018
 
2017
-  Gaussian Quadrature for Kernel FeaturesIn Advances in Neural Information Processing Systems (NeurIPS) 30 , 2017