The recently proposed serialized output training (SOT) simplifies
multi-...
Self-supervised speech pre-training empowers the model with the contextu...
Speech data on the Internet are proliferating exponentially because of t...
Background sound is an informative form of art that is helpful in provid...
Recently cross-channel attention, which better leverages multi-channel
s...
This paper presents the NWPU-ASLP speaker anonymization system for
Voice...
Transformer-based models have demonstrated their effectiveness in automa...
General accent recognition (AR) models tend to directly extract low-leve...
The ICASSP 2022 Multi-channel Multi-party Meeting Transcription Grand
Ch...
Recent development of speech signal processing, such as speech recogniti...
Self-supervised pretraining on speech data has achieved a lot of progres...
In this paper, we present WenetSpeech, a multi-domain Mandarin corpus
co...
This paper describes the ESPnet-ST group's IWSLT 2021 submission in the
...
Non-autoregressive (NAR) models have achieved a large inference computat...
Continuous integrate-and-fire (CIF) based models, which use a soft and
m...
This paper describes the recent development of ESPnet
(https://github.co...
Conversational speech recognition is regarded as a challenging task due ...
In real-life applications, the performance of speaker recognition system...
In this study, we present recent developments on ESPnet: End-to-End Spee...
Neural sequence-to-sequence models are well established for applications...
Speaker recognition is a popular topic in biometric authentication and m...
In this paper, we present our overall efforts to improve the performance...