Current speaker recognition systems primarily rely on supervised approac...
Speaker extraction and diarization are two crucial enabling techniques f...
Deep neural network-based systems have significantly improved the perfor...
Music editing primarily entails the modification of instrument tracks or...
Neural speech separation has made remarkable progress and its integratio...
The mismatch between close-set training and open-set testing usually lea...
Self-supervised learning (SSL) based speech pre-training has attracted m...
Due to the rapid development of computing hardware resources and the dra...
This paper proposes a novel Attention-based Encoder-Decoder network for
...
Automatic speaker verification task has made great achievements using de...
Code-switching speech refers to a means of expression by mixing two or m...
Traditional automatic speech recognition (ASR) systems usually focus on
...
Different speaker recognition challenges have been held to assess the sp...
Speaker modeling is essential for many related tasks, such as speaker
re...
In real application scenarios, it is often challenging to obtain a large...
This report describes the SJTU-AISPEECH system for the Voxceleb Speaker
...
This paper presents the SJTU system for both text-dependent and
text-ind...
For self-supervised speaker verification, the quality of pseudo labels
d...
This paper presents recent progress on integrating speech separation and...
Modern non-autoregressive (NAR) speech recognition systems aim to accele...
This technical report describes the SJTU X-LANCE Lab system for the thre...
Accent variability has posed a huge challenge to automatic speech
recogn...
We develop an end-to-end system for multi-channel, multi-speaker automat...
The ICASSP 2022 Multi-channel Multi-party Meeting Transcription Grand
Ch...
Continuous speech separation for meeting pre-processing has recently bec...
Multi-talker conversational speech processing has drawn many interests f...
The deep learning based time-domain models, e.g. Conv-TasNet, have shown...
Self-supervised learning (SSL) achieves great success in speech recognit...
The advances in attention-based encoder-decoder (AED) networks have brou...
The speech representations learned from large-scale unlabeled data have ...
Large performance degradation is often observed for speaker ver-ificatio...
Recent studies have shown that neural vocoders based on generative
adver...
The continuous speech separation (CSS) is a task to separate the speech
...
Recently, the end-to-end approach has been successfully applied to
multi...
The variety of accents has posed a big challenge to speech recognition. ...
This paper describes the AISpeech-SJTU system for the accent identificat...
Data augmentation is commonly used to help build a robust speaker
verifi...
Time-domain training criteria have proven to be very effective for the
s...
Training a code-switching end-to-end automatic speech recognition (ASR) ...
Voice activity detection (VAD) is an essential pre-processing step for t...
Language models (LM) play an important role in large vocabulary continuo...
Albeit recent progress in speaker verification generates powerful models...
Despite successful applications of end-to-end approaches in multi-channe...
Recently, fully recurrent neural network (RNN) based end-to-end models h...
Recently, the end-to-end approach has proven its efficacy in monaural
mu...
Recently, speaker embeddings extracted from a speaker discriminative dee...
Recently, end-to-end models have become a popular approach as an alterna...
Speech recognition is a sequence prediction problem. Besides employing
v...
Speaker-aware source separation methods are promising workarounds for ma...
Linear Discriminant Analysis (LDA) has been used as a standard
post-proc...