Expanding the language coverage of speech technology has the potential t...
We present a single neural network architecture composed of task-agnosti...
Self-supervision has shown great potential for audio-visual speech
recog...
Current self-supervised learning algorithms are often modality-specific ...
Recent studies find existing self-supervised speech encoders contain
pri...
This paper studies a simple extension of image-based Masked Autoencoders...
Self-supervised learning (SSL) of speech representations has received mu...
How should we learn visual representations for embodied agents that must...
Squeeze and Efficient Wav2vec (SEW) is a recently proposed architecture ...
We describe a method to jointly pre-train speech and text in an
encoder-...
We introduce the first unsupervised speech synthesis system based on a
s...
Unsupervised speech recognition has shown great potential to make Automa...
Human speech data comprises a rich set of domain factors such as accent,...
While the general idea of self-supervised learning is identical across
m...
This paper presents XLS-R, a large-scale model for cross-lingual speech
...
Recent progress in self-training, self-supervised pretraining and
unsupe...
Language identification greatly impacts the success of downstream tasks ...
Despite rapid progress in the recent past, current speech recognition sy...
In this paper, we improve speech translation (ST) through effectively
le...
Self-supervised learning of speech representations has been a very activ...
Generative spoken language modeling involves learning jointly the acoust...
We demonstrate that transformers obtain impressive performance even when...
We introduce a new unsupervised task, spoken language modeling: the lear...
Neural latent variable models enable the discovery of interesting struct...
Self-training and unsupervised pre-training have emerged as effective
ap...
This paper presents XLSR which learns cross-lingual speech representatio...
We show for the first time that learning powerful representations from s...
We present pre-training approaches for self-supervised representation
le...
We propose vq-wav2vec to learn discrete representations of audio segment...
This paper describes Facebook FAIR's submission to the WMT19 shared news...
We explore unsupervised pre-training for speech recognition by learning
...
fairseq is an open-source sequence modeling toolkit that allows research...
Pre-trained language model representations have been successful in a wid...
We present a new approach for pretraining a bi-directional transformer m...
Self-attention is a useful mechanism to build generative models for lang...
We introduce adaptive input representations for neural language modeling...