Zero-shot text-to-speech (TTS) synthesis aims to clone any unseen speake...
This paper presents an end-to-end high-quality singing voice synthesis (...
The spontaneous behavior that often occurs in conversations makes speech...
For text-to-speech (TTS) synthesis, prosodic structure prediction (PSP) ...
Expressive speech synthesis is crucial for many human-computer interacti...
Music-driven 3D dance generation has become an intensive research topic ...
Due to the mismatch between the source and target domains, how to better...
Recent advances in text-to-speech have significantly improved the
expres...
Previous works on expressive speech synthesis focus on modelling the
mon...
Non-parallel data voice conversion (VC) have achieved considerable
break...
Previous works on expressive speech synthesis mainly focus on current
se...
Previously proposed FullSubNet has achieved outstanding performance in D...
This paper describes a variational auto-encoder based non-autoregressive...
Generating 3D speech-driven talking head has received more and more atte...
Automatic recognition of overlapped speech remains a highly challenging ...
In this paper, we present a generic and robust multimodal synthesis syst...
End-to-end speech synthesis method such as Tacotron, Tacotron2 and
Trans...