Recent advancements in speech synthesis have leveraged GAN-based network...
In recent years, large-scale pre-trained speech language models (SLMs) h...
In this paper, we present StyleTTS 2, a text-to-speech (TTS) model that
...
A key challenge in machine learning is to generalize from training data ...
Recently, the zero-shot semantic segmentation problem has attracted
incr...
Binaural speech separation in real-world scenarios often involves moving...
Auditory attention decoding (AAD) is a technique used to identify and am...
Large-scale pre-trained language models have been shown to be helpful in...
One-shot voice conversion (VC) aims to convert speech from any source sp...
Neural Architecture Search (NAS) has become a de facto approach in the r...
Text-to-Speech (TTS) has recently seen great progress in synthesizing
hi...
This work describes a speech denoising system for machine ears that aims...
Conversational recommendation systems (CRS) engage with users by inferri...
The continuous speech separation (CSS) is a task to separate the speech
...
Leveraging additional speaker information to facilitate speech separatio...
Ultra-lightweight model design is an important topic for the deployment ...
Auto-bidding plays an important role in online advertising and has becom...
Modules in all existing speech separation networks can be categorized in...
Model size and complexity remain the biggest challenges in the deploymen...
Deep learning speech separation algorithms have achieved great success i...
Beamforming has been extensively investigated for multi-channel audio
pr...