Estimating the 3D structure of the human body from natural scenes is a
f...
Detection-based methods have been viewed unfavorably in crowd analysis d...
Recently, digital humans for interpersonal interaction in virtual
enviro...
In most cases, bilingual TTS needs to handle three types of input script...
The ability to associate touch with sight is essential for tasks that re...
Recent years have witnessed a great development of Convolutional Neural
...
Learning emotion embedding from reference audio is a straightforward app...
Attention-based seq2seq text-to-speech systems, especially those use
sel...