Recent advancements in speech synthesis have leveraged GAN-based network...
In recent years, large-scale pre-trained speech language models (SLMs) h...
In this paper, we present StyleTTS 2, a text-to-speech (TTS) model that
...
Lifelong audio feature extraction involves learning new sound classes
in...
Binaural speech separation in real-world scenarios often involves moving...
Auditory attention decoding (AAD) is a technique used to identify and am...
Large-scale pre-trained language models have been shown to be helpful in...
One-shot voice conversion (VC) aims to convert speech from any source sp...
Text-to-Speech (TTS) has recently seen great progress in synthesizing
hi...
We present an unsupervised non-parallel many-to-many voice conversion (V...
Most speech separation methods, trying to separate all channel sources
s...
Leveraging additional speaker information to facilitate speech separatio...
Ultra-lightweight model design is an important topic for the deployment ...
Various neural network architectures have been proposed in recent years ...
Modules in all existing speech separation networks can be categorized in...
Model size and complexity remain the biggest challenges in the deploymen...
Many recent source separation systems are designed to separate a fixed n...
Deep learning speech separation algorithms have achieved great success i...
An important problem in ad-hoc microphone speech separation is how to
gu...
Beamforming has been extensively investigated for multi-channel audio
pr...
Robust speech processing in multitalker acoustic environments requires
a...
Robust speech processing in multi-talker environments requires effective...
In this study, we propose a deep neural network for reconstructing
intel...
Despite the recent success of deep learning for many speech processing t...
Deep clustering is the first method to handle general audio separation
s...