Recently, there have been significant advancements in voice conversion,
...
Recent strides in neural speech synthesis technologies, while enjoying
w...
Auditory Attention Detection (AAD) aims to detect target speaker from br...
Audio deepfake detection is an emerging active topic. A growing number o...
The rhythm of synthetic speech is usually too smooth, which causes that ...
Current fake audio detection algorithms have achieved promising performa...
End-to-end model, especially Recurrent Neural Network Transducer (RNN-T)...
Dynamic facial expression recognition (DFER) is essential to the develop...
Multimodal emotion recognition is an active research topic in artificial...
Denoising Diffusion Probabilistic Models have shown extraordinary abilit...
Self-supervised speech models are a rapidly developing research topic in...
The rapid advancement of spoofing algorithms necessitates the developmen...
Audio deepfake detection is an emerging topic in the artificial intellig...
Current fake audio detection relies on hand-crafted features, which lose...
Existing fake audio detection systems perform well in in-domain testing,...
Conversational text-to-speech (TTS) aims to synthesize speech with prope...
Over the past few decades, multimodal emotion recognition has made remar...
Noisy partial label learning (noisy PLL) is an important branch of weakl...
Text-to-speech (TTS) and voice conversion (VC) are two different tasks b...
Text-based speech editing allows users to edit speech by intuitively cut...
Previous databases have been designed to further the development of fake...
There are already some datasets used for fake audio detection, such as t...
Partial label learning (PLL) is a typical weakly supervised learning, wh...
Current end-to-end code-switching Text-to-Speech (TTS) can already gener...
Speech is the fundamental mode of human communication, and its synthesis...
Many effective attempts have been made for deepfake audio detection. How...
Many effective attempts have been made for fake audio detection. However...
The existing fake audio detection systems often rely on expert experienc...
With the proliferation of user-generated online videos, Multimodal Senti...
Recently, pioneer research works have proposed a large number of acousti...
In this paper, we propose the solution to the Multi-Task Learning (MTL)
...
Fake audio detection is a growing concern and some relevant datasets hav...
Temporal knowledge prediction is a crucial task for the event early warn...
Speech emotion recognition (SER) is a crucial research topic in
human-co...
The traditional vocoders have the advantages of high synthesis efficienc...
Conversations have become a critical data format on social media platfor...
The text-based speech editor allows the editing of speech through intuit...
Knowledge graph embedding (KGE) aims to represent entities and relations...
Audio deepfake detection is an emerging topic, which was included in the...
End-to-end singing voice synthesis (SVS) is attractive due to the avoida...
Code-switching is about dealing with alternative languages in the
commun...
Knowledge Graphs (KGs) have shown great success in recommendation. This ...
Graph representation learning has attracted a surge of interest recently...
Fake audio attack becomes a major threat to the speaker verification sys...
Diverse promising datasets have been designed to hold back the developme...
Transducer-based models, such as RNN-Transducer and transformer-transduc...
The autoregressive (AR) models, such as attention-based encoder-decoder
...
Attention-based encoder-decoder (AED) models have achieved promising
per...
Knowledge graphs have been demonstrated to be an effective tool for nume...
Recurrent neural networks (RNNs) have shown significant improvements in
...