Speech to text models tend to be trained and evaluated against a single
...
With 4.5 million hours of English speech from 10 different sources acros...
Hybrid automatic speech recognition (ASR) models are typically sequentia...
Language identification greatly impacts the success of downstream tasks ...
In this work, to measure the accuracy and efficiency for a latency-contr...
End-to-end automatic speech recognition (ASR) models with a single neura...
Attention-based models have been gaining popularity recently for their s...
In this paper, we summarize the application of transformer and its strea...
This paper proposes an efficient memory transformer Emformer for low lat...
In this work, we first show that on the widely used LibriSpeech benchmar...
Transformers, originally proposed for natural language processing (NLP)
...
Transformer-based acoustic modeling has achieved great suc-cess for both...
Videos uploaded on social media are often accompanied with textual
descr...
Supervised ASR models have reached unprecedented levels of accuracy, tha...
Deep acoustic models typically receive features in the first layer of th...
We propose and evaluate transformer-based acoustic models (AMs) for hybr...