Publications
2025
- Marconi: Prefix Caching for the Era of Hybrid LLMsIn Machine Learning and Systems (MLSys) , 2025
- M1: Towards Scalable Test-Time Compute with Mamba Reasoning ModelsarXiv preprint arXiv:2504.10449, 2025
- HybriDNA: A Hybrid Transformer-Mamba2 Long-Range DNA Language ModelarXiv preprint arXiv:2502.10807, 2025
- Thinking slow, fast: Scaling inference compute with distilled reasonersarXiv preprint arXiv:2502.20339, 2025
- Ladder-residual: parallelism-aware architecture for accelerating large model inference with communication overlappingIn International Conference on Machine Learning (ICML) , 2025
2024
- Redpajama: an open dataset for training large language modelsIn Advances in Neural Information Processing Systems (NeurIPS) , 2024
- The mamba in the llama: Distilling and accelerating hybrid modelsIn Advances in Neural Information Processing Systems (NeurIPS) , 2024
- Hydra: Bidirectional state space models through generalized matrix mixersIn Advances in Neural Information Processing Systems (NeurIPS) , 2024
- Bitdelta: Your fine-tune may only be worth one bitIn Advances in Neural Information Processing Systems (NeurIPS) , 2024
- An Empirical Study of Mamba-based Language ModelsarXiv preprint arXiv:2406.07887, 2024
- Starcoder 2 and the stack v2: The next generationarXiv preprint arXiv:2402.19173, 2024
- Caduceus: Bi-directional equivariant long-range DNA sequence modelingIn International Conference on Machine Learning (ICML) , 2024
- Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space DualityIn International Conference on Machine Learning (ICML) , 2024
- Medusa: Simple LLM inference acceleration framework with multiple decoding headsIn International Conference on Machine Learning (ICML) , 2024
2023
- Mamba: Linear-Time Sequence Modeling with Selective State SpacesConference on Language Modeling (COLM), 2023
- Deja vu: Contextual sparsity for efficient llms at inference timeIn International Conference on Machine Learning , 2023
- Effectively modeling time series with simple discrete state spacesIn International Conference on Learning Representations (ICLR) , 2023
- Hyena Hierarchy: Towards Larger Convolutional Language ModelsIn International Conference on Machine Learning (ICML) , 2023
- Simple Hardware-Efficient Long Convolutions for Sequence ModelingIn International Conference on Machine Learning (ICML) , 2023
- Hungry Hungry Hippos: Towards Language Modeling with State Space ModelsIn The International Conference on Learning Representations (ICLR) , 2023
2022
- Monarch: Expressive Structured Matrices for Efficient and Accurate TrainingIn International Conference on Machine Learning (ICML) , 2022
- Decentralized Training of Foundation Models in Heterogeneous EnvironmentsIn Advances in Neural Information Processing Systems , 2022
- Fine-tuning Language Models over Slow Networks using Activation Compression with GuaranteesIn Advances in Neural Information Processing Systems , 2022
- Transform Once: Efficient Operator Learning in Frequency DomainIn Advances in Neural Information Processing Systems , 2022
- S4ND: Modeling Images and Videos as Multidimensional Signals with State SpacesIn Advances in Neural Information Processing Systems , 2022
- ButterflyFlow: Building Invertible Layers with Butterfly MatricesIn International Conference on Machine Learning (ICML) , 2022
- Pixelated Butterfly: Simple and Efficient Sparse training for Neural Network ModelsIn International Conference on Learning Representations (ICLR) , 2022
2021
- Scatterbrain: Unifying Sparse and Low-rank AttentionIn Advances in Neural Information Processing Systems (NeurIPS) , 2021
- Combining Recurrent, Convolutional, and Continuous-time Models with Linear State Space LayersAdvances in Neural Information Processing Systems, 2021
- Rethinking Neural Operations for Diverse TasksIn Advances in Neural Information Processing Systems , 2021
- Catformer: Designing Stable Transformers via Sensitivity AnalysisIn International Conference on Machine Learning (ICML) , 2021
- Knowledge Distillation as Semiparametric InferenceIn International Conference on Learning Representations (ICLR) , 2021
- MONGOOSE: A Learnable LSH Framework for Efficient Neural Network TrainingIn International Conference on Learning Representations (ICLR) , 2021
2020
- HiPPO: Recurrent Memory with Optimal Polynomial ProjectionsIn Advances in neural information processing systems (NeurIPS) , 2020
- Kaleidoscope: An Efficient, Learnable Representation For All Structured Linear MapsIn The International Conference on Learning Representations (ICLR) , 2020
2019
- On the downstream performance of compressed word embeddingsIn Advances in Neural Information Processing Systems (NeurIPS) 32 , 2019
- Approximating the Permanent by Sampling from Adaptive PartitionsIn Advances in Neural Information Processing Systems (NeurIPS) 32 , 2019
- Adaptive Hashing for Model CountingIn Proceedings of the 35th Conference on Uncertainty in Artificial Intelligence (UAI) , 2019
- Learning Fast Algorithms for Linear Transforms Using Butterfly FactorizationsIn The International Conference on Machine Learning (ICML) 36 , 2019
- A Kernel Theory of Modern Data AugmentationIn The International Conference on Machine Learning (ICML) 36 , 2019
- Low-Precision Random Fourier Features for Memory-Constrained Kernel ApproximationIn The International Conference on Artificial Intelligence and Statistics (AISTATS) 22 , 2019
2018
- Learning Compressed Transforms with Low Displacement RankIn Advances in Neural Information Processing Systems (NeurIPS) 31 , 2018
2017
- Gaussian Quadrature for Kernel FeaturesIn Advances in Neural Information Processing Systems (NeurIPS) 30 , 2017