Scene-aware Complementary Item Retrieval (CIR) is a challenging task whi...
Predicting panoramic indoor lighting from a single perspective image is ...
Myocardial pathology segmentation (MyoPS) is critical for the risk
strat...
Text-to-speech (TTS) and singing voice synthesis (SVS) aim at generating...
Myocardial pathology segmentation (MyoPS) can be a prerequisite for the
...
In this work we focus on automatic segmentation of multiple anatomical
s...
The zero-shot scenario for speech generation aims at synthesizing a nove...
Building a voice conversion system for noisy target speakers, such as us...
The ideal goal of voice conversion is to convert the source speaker's sp...
Though significant progress has been made for speaker-dependent
Video-to...
Lighting prediction from a single image is becoming increasingly importa...
Expressive synthetic speech is essential for many human-computer interac...
Color image steganography based on deep learning is the art of hiding
in...
Cross-speaker style transfer (CSST) in text-to-speech (TTS) synthesis ai...
Humans perceive the world by concurrently processing and fusing
high-dim...
Current two-stage TTS framework typically integrates an acoustic model w...
In spoken conversations, spontaneous behaviors like filled pause and
pro...
We present a method to infer the 3D pose of mice, including the limbs an...
Few-shot relation extraction (FSRE) is of great importance in long-tail
...
In this paper, we present a transformer-based learning framework for 3D ...
Emotion embedding space learned from references is a straightforward app...
This paper proposes a unified model to conduct emotion transfer, control...
Singing voice synthesis has been paid rising attention with the rapid
de...
Data efficient voice cloning aims at synthesizing target speaker's voice...
Attention-based seq2seq text-to-speech systems, especially those use
sel...
In this paper, we propose multi-band MelGAN, a much faster waveform
gene...
Attention-based sequence-to-sequence (seq2seq) speech synthesis has achi...
In this paper, we aim at improving the performance of synthesized speech...
Most recent garment capturing techniques rely on acquiring multiple view...
Humans refer to objects in their environments all the time, especially i...