End-to-end model, especially Recurrent Neural Network Transducer (RNN-T)...
Previous databases have been designed to further the development of fake...
The CTC model has been widely applied to many application scenarios beca...
Many effective attempts have been made for deepfake audio detection. How...
The existing fake audio detection systems often rely on expert experienc...
Audio deepfake detection is an emerging topic, which was included in the...
Code-switching is about dealing with alternative languages in the
commun...
Fake audio attack becomes a major threat to the speaker verification sys...
Diverse promising datasets have been designed to hold back the developme...
Transducer-based models, such as RNN-Transducer and transformer-transduc...
The autoregressive (AR) models, such as attention-based encoder-decoder
...
Attention-based encoder-decoder (AED) models have achieved promising
per...
The joint training framework for speech enhancement and recognition meth...
Despite the recent significant advances witnessed in end-to-end (E2E) AS...
Non-autoregressive transformer models have achieved extremely fast infer...
Although attention based end-to-end models have achieved promising
perfo...
Previous studies demonstrate that word embeddings and part-of-speech (PO...
Recently, language identity information has been utilized to improve the...
For most of the attention-based sequence-to-sequence models, the decoder...
Because an attention based sequence-to-sequence speech (Seq2Seq) recogni...
Recurrent neural network transducers (RNN-T) have been successfully appl...
Integrating an external language model into a sequence-to-sequence speec...