tumblr statistics


I am a Principal Research Scientist/Director at Meta/FAIR in Menlo Park where I work on machine learning, speech processing and NLP which resulted in projects such as wav2vec, the fairseq toolkit, the first modern convolutional seq2seq models outperforming RNNs, as well as top ranked submissions at the WMT news translation task in 2018 and 2019. Before that I was at Microsoft Research, where I did early work on neural machine translation and neural dialogue models. I earned my Ph.D. at the University of Edinburgh where I was advised by Adam Lopez and Philipp Koehn.


  • MMS scales speech technology to 1,000+ languages and provides language identification for over 4,000 languages.
  • data2vec 2.0 enables pre-training for speech, vision and NLP at up to 16x the speed of existing algorithms.
  • We released data2vec, a single self-supervised learning algorithm which achieves high peformance for vision, speech and language.

Selected Papers (See Google Scholar for full list)

Scaling Speech Technology to 1,000+ Languages
Vineel Pratap, Andros Tjandra, Bowen Shi, Paden Tomasello, Arun Babu, Sayani Kundu, Ali Elkahky, Zhaoheng Ni, Apoorv Vyas, Maryam Fazel-Zarandi, Alexei Baevski, Yossi Adi, Xiaohui Zhang, Wei-Ning Hsu, Alexis Conneau, Michael Auli. In JMLR, 2024.
Abstract Blog Code
Efficient Self-supervised Learning with Contextualized Target Representations for Vision, Speech and Language
Alexei Baevski, Arun Babu, Wei-Ning Hsu, Michael Auli. In ICML, 2023.
Abstract Blog Code
data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language
Alexei Baevski, Wei-Ning Hsu, Qiantong Xu, Arun Babu, Jiatao Gu, Michael Auli. In arXiv, 2022.
Abstract Blog Code
XLS-R: Self-supervised Cross-lingual Speech Representation Learning at Scale
Arun Babu, Changhan Wang, Andros Tjandra, Kushal Lakhotia, Qiantong Xu, Naman Goyal, Kritika Singh, Patrick von Platen, Yatharth Saraf, Juan Pino, Alexei Baevski, Alexis Conneau, Michael Auli. In arXiv, 2021.
Abstract Blog Code
Unsupervised Speech Recognition
Alexei Baevski, Wei-Ning Hsu, Alexis Conneau, Michael Auli. In Proc. of NeurIPS, 2021.
Abstract Blog Code
Beyond english-centric multilingual machine translation
Angela Fan, Shruti Bhosale, Holger Schwenk, Zhiyi Ma, Ahmed El-Kishky, Siddharth Goyal, Mandeep Baines, Onur Celebi, Guillaume Wenzek, Vishrav Chaudhary, Naman Goyal, Tom Birch, Vitaliy Liptchinsky, Sergey Edunov, Edouard Grave, Michael Auli*, Armand Joulin*. In JMLR, 2020.
Abstract Blog Code
wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations
Alexei Baevski, Henry Zhou, Abdelrahman Mohamed, Michael Auli. In NeurIPS, 2020.
Abstract Blog Code
wav2vec: Unsupervised Pre-training for Speech Recognition
Steffen Schneider, Alexei Baevski, Ronan Collobert, Michael Auli. In Proc. of Interspeech, 2019.
Abstract Blog Code
fairseq: A fast, extensible toolkit for sequence modeling
Myle Ott, Sergey Edunov, Alexei Baevski, Angela Fan, Sam Gross, Nathan Ng, David Grangier, Michael Auli. In Proc. of NAACL, Demonstrations, 2019.
Abstract Code
Pay Less Attention with Lightweight and Dynamic Convolutions
Felix Wu, Angela Fan, Alexei Baevski, Yann N Dauphin, Michael Auli. In Proc. of ICLR, 2019.
Abstract Code
Understanding Back-Translation at Scale
Sergey Edunov, Myle Ott, David Grangier, Michael Auli. In Proc. of EMNLP, 2018.
Abstract Code
Convolutional Sequence to Sequence Learning
Jonas Gehring, Michael Auli, David Grangier, Denis Yarats, Yann N. Dauphin. In Proc. of ICML, 2017.
Abstract Blog Code
Sequence Level Training with Recurrent Neural Networks
Marc'Aurelio Ranzato, Sumit Chopra, Michael Auli, and Wojciech Zaremba. In Proc. of ICLR, 2016.
Abstract Code



Unified self-supervised learning for speech, vision and NLP
Talk at ECCV Workshop on Perception
wav2vec: Self-supervised learning of speech representations
Talk at MIT, CMU, U of Edinburgh, Spring 2021.
Efficient Sequence Modeling
Talk at WNGT'19, Stanford, Berkeley, Nov 2019.
Sequence to Sequence Learning: Fast Training and Inference with Gated Convolutions
Talk at Johns Hopkins University, Oct 2017.
Learning to translate with neural networks
Talk at Facebook, Google, Amazon and the University of Washington, 2014.
Integrated Parsing and Tagging
Talk at Carnegie Mellon University, Johns Hopkins University, BBN Technologies, IBM Research and Microsoft Research, 2011.