The last year has seen astonishing progress in text-prompted image gener...
Unpaired text and audio injection have emerged as dominant methods for
i...
Dual learning is a paradigm for semi-supervised machine learning that se...
We explore unifying a neural segmenter with two-pass cascaded encoder AS...
The careful construction of audio representations has become a dominant
...
Improving the performance of end-to-end ASR models on long utterances ra...
Language models (LMs) significantly improve the recognition accuracy of
...
Language model fusion helps smart assistants recognize words which are r...
We introduce Lookup-Table Language Models (LookupLM), a method for scali...
End-to-end (E2E) automatic speech recognition (ASR) systems lack the dis...
Proper nouns present a challenge for end-to-end (E2E) automatic speech
r...
Thus far, end-to-end (E2E) models have not been shown to outperform
stat...
Recognizing written domain numeric utterances (e.g. I need 1.25.) can be...