Large self-supervised pre-trained speech models require computationally
...
Heart auscultations are a low-cost and effective way of detecting valvul...
In this paper, we propose ACA-Net, a lightweight, global context-aware
s...
Most of the existing neural-based models for keyword spotting (KWS) in s...
Existing self-supervised pre-trained speech models have offered an effec...
This paper focuses on multi-enrollment speaker recognition which natural...
This paper summarizes the outcomes from the ISCSLP 2022 Intelligent Cock...
To let the state-of-the-art end-to-end ASR model enjoy data efficiency, ...
Noise robustness in keyword spotting remains a challenge as many models ...
Continuously learning new classes without catastrophic forgetting is a
c...
Intermediate layer output (ILO) regularization by means of multitask tra...
Internal Language Model Estimation (ILME) based language model (LM) fusi...
In this paper, we tackle the new Language-Based Audio Retrieval task pro...
It is critical for a keyword spotting model to have a small footprint as...
Catastrophic forgetting is a thorny challenge when updating keyword spot...
Speaker extraction aims to extract the target speaker's voice from a
mul...
Building efficient architecture in neural speech processing is paramount...
Transformer models have been used in automatic speech recognition (ASR)
...
To realize robust end-to-end Automatic Speech Recognition(E2E ASR) under...
Automatic height and age estimation of speakers using acoustic features ...
Speaker extraction uses a pre-recorded reference speech as the reference...
Automatic speech recognition (ASR) for under-represented named-entity (U...
Human can perform multi-task recognition from speech. For instance, huma...
Domain adaptation or transfer learning using pre-trained language models...
In this work, we study leveraging extra text data to improve low-resourc...
In this paper, we present a series of complementary approaches to improv...
Speaker extraction aims to extract the target speech signal from a
multi...
Speaker extraction is to extract a target speaker's voice from multi-tal...
Speaker extraction aims to mimic humans' selective auditory attention by...
The attention-based end-to-end (E2E) automatic speech recognition (ASR)
...
The I4U consortium was established to facilitate a joint entry to NIST
s...
The lack of code-switch training data is one of the major concerns in th...
The neural language models (NLM) achieve strong generalization capabilit...
The SpeakerBeam-FE (SBF) method is proposed for speaker extraction. It
a...
In a typical voice conversion system, vocoder is commonly used for
speec...
The performance of speaker verification degrades significantly when the ...
Code-switching (CS) refers to a linguistic phenomenon where a speaker us...
In automatic speech recognition (ASR) systems, recurrent neural network
...
In this paper, we present our overall efforts to improve the performance...