tumblr statistics


I am a Principal Research Scientist/Director at Facebook AI Research in Menlo Park where I work on speech processing and NLP which resulted in projects such as wav2vec, the fairseq toolkit, the first modern convolutional seq2seq models outperforming RNNs, as well as top ranked submissions at the WMT news translation task in 2018 and 2019. Before that I was at Microsoft Research, where I did early work on neural machine translation and neural dialogue models. I earned my Ph.D. at the University of Edinburgh where I was advised by Adam Lopez and Philipp Koehn.


  • We released data2vec, a single self-supervised learning algorithm which achieves high peformance for vision, speech and language.
  • XLS-R provides self-supervised speech representations for 128 languages and sets a new state of the art for speech translation, language identification and several ASR benchmarks.
  • wav2vec-U enables speech recognition performance competitive to the best systems trained on 960h of labeled data from only two years ago and was accepted for oral presentation at NeurIPS 2021.
  • I recently gave a talk about wav2vec at MIT, CMU and the University of Edinburgh.
  • wav2vec 2.0 enables speech recognition systems using just 10 minutes of transcribed data. Applying it to cross-lingual training gives nice results on CommonVoice and BABEL.

Selected Papers (See Google Scholar for full list)

Efficient Self-supervised Learning with Contextualized Target Representations for Vision, Speech and Language
Alexei Baevski, Arun Babu, Wei-Ning Hsu, Michael Auli. In arXiv, 2022.
Abstract Blog Code
data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language
Alexei Baevski, Wei-Ning Hsu, Qiantong Xu, Arun Babu, Jiatao Gu, Michael Auli. In arXiv, 2022.
Abstract Blog Code
XLS-R: Self-supervised Cross-lingual Speech Representation Learning at Scale
Arun Babu, Changhan Wang, Andros Tjandra, Kushal Lakhotia, Qiantong Xu, Naman Goyal, Kritika Singh, Patrick von Platen, Yatharth Saraf, Juan Pino, Alexei Baevski, Alexis Conneau, Michael Auli. In arXiv, 2021.
Abstract Blog Code
Unsupervised Speech Recognition
Alexei Baevski, Wei-Ning Hsu, Alexis Conneau, Michael Auli. In Proc. of NeurIPS, 2021.
Abstract Blog Code
Beyond english-centric multilingual machine translation
Angela Fan, Shruti Bhosale, Holger Schwenk, Zhiyi Ma, Ahmed El-Kishky, Siddharth Goyal, Mandeep Baines, Onur Celebi, Guillaume Wenzek, Vishrav Chaudhary, Naman Goyal, Tom Birch, Vitaliy Liptchinsky, Sergey Edunov, Edouard Grave, Michael Auli*, Armand Joulin*. In JMLR, 2020.
Abstract Blog Code
wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations
Alexei Baevski, Henry Zhou, Abdelrahman Mohamed, Michael Auli. In NeurIPS, 2020.
Abstract Blog Code
wav2vec: Unsupervised Pre-training for Speech Recognition
Steffen Schneider, Alexei Baevski, Ronan Collobert, Michael Auli. In Proc. of Interspeech, 2019.
Abstract Blog Code
fairseq: A fast, extensible toolkit for sequence modeling
Myle Ott, Sergey Edunov, Alexei Baevski, Angela Fan, Sam Gross, Nathan Ng, David Grangier, Michael Auli. In Proc. of NAACL, Demonstrations, 2019.
Abstract Code
Pay Less Attention with Lightweight and Dynamic Convolutions
Felix Wu, Angela Fan, Alexei Baevski, Yann N Dauphin, Michael Auli. In Proc. of ICLR, 2019.
Abstract Code
Understanding Back-Translation at Scale
Sergey Edunov, Myle Ott, David Grangier, Michael Auli. In Proc. of EMNLP, 2018.
Abstract Code
Convolutional Sequence to Sequence Learning
Jonas Gehring, Michael Auli, David Grangier, Denis Yarats, Yann N. Dauphin. In Proc. of ICML, 2017.
Abstract Blog Code
Sequence Level Training with Recurrent Neural Networks
Marc'Aurelio Ranzato, Sumit Chopra, Michael Auli, and Wojciech Zaremba. In Proc. of ICLR, 2016.
Abstract Code



Unified self-supervised learning for speech, vision and NLP
Talk at ECCV Workshop on Perception
wav2vec: Self-supervised learning of speech representations
Talk at MIT, CMU, U of Edinburgh, Spring 2021.
Efficient Sequence Modeling
Talk at WNGT'19, Stanford, Berkeley, Nov 2019.
Sequence to Sequence Learning: Fast Training and Inference with Gated Convolutions
Talk at Johns Hopkins University, Oct 2017.
Learning to translate with neural networks
Talk at Facebook, Google, Amazon and the University of Washington, 2014.
Integrated Parsing and Tagging
Talk at Carnegie Mellon University, Johns Hopkins University, BBN Technologies, IBM Research and Microsoft Research, 2011.