We introduce O-1, a new self-training objective to reduce training bias ...
Accurate recognition of specific categories, such as persons' names, dat...
The last year has seen astonishing progress in text-prompted image gener...
Recently, a number of approaches to train speech models by incorpo-ratin...
We introduce the Universal Speech Model (USM), a single large model that...
We propose JEIT, a joint end-to-end (E2E) model and internal language mo...
This paper proposes Virtuoso, a massively multilingual speech-text joint...
Data augmentation is a ubiquitous technique used to provide robustness t...
Training state-of-the-art Automated Speech Recognition (ASR) models typi...
Automatic speech recognition (ASR) needs to be robust to speaker differe...
Building inclusive speech recognition systems is a crucial step towards
...
We present Maestro, a self-supervised training method to unify
represent...
Model fine-tuning and adaptation have become a common approach for model...
Masked speech modeling (MSM) methods such as wav2vec2 or w2v-BERT learn
...
Self-supervised pretraining for Automated Speech Recognition (ASR) has s...
Recent neural text-to-speech (TTS) models with fine-grained latent featu...
Recent success of the Tacotron speech synthesis architecture and its var...
We present a multispeaker, multilingual text-to-speech (TTS) synthesis m...
Ranking is used for a wide array of problems, most notably information
r...
The performance of automatic speech recognition systems degrades with
in...
End-to-end (E2E) systems have achieved competitive results compared to
c...
For classification problems, feature extraction is a crucial process whi...