Text-to-speech (TTS) and singing voice synthesis (SVS) aim at generating...
In current two-stage neural text-to-speech (TTS) paradigm, it is ideal t...
Speaker adaptation in text-to-speech synthesis (TTS) is to finetune a
Building a high-quality singing corpus for a person who is not good at
This paper introduces Opencpop, a publicly available high-quality Mandar...
Expressive synthetic speech is essential for many human-computer interac...
Automatically generating videos in which synthesized speech is synchroni...
This paper proposes a new model, referred to as the show and speak (SAS)...
An estimated half of the world's languages do not have a written form, m...
In the generalized zero-shot learning, synthesizing unseen data with
The development of deep convolutional neural network architecture is cri...
Zero-shot learning, which aims to recognize new categories that are not