This work addresses the problem of online exploration and visual sensor
...
Cross-modal retrieval (CMR) has been extensively applied in various doma...
Most existing sandstorm image enhancement methods are based on tradition...
This paper integrates graph-to-sequence into an end-to-end text-to-speec...
Generating realistic talking faces is a complex and widely discussed tas...
The rise of the phenomenon of the "right to be forgotten" has prompted
r...
Voice conversion is a method that allows for the transformation of speak...
Music Emotion Recognition involves the automatic identification of emoti...
In the realm of Large Language Models, the balance between instruction d...
Image Quality Assessment (IQA) constitutes a fundamental task within the...
Voice conversion as the style transfer task applied to speech, refers to...
Chinese Automatic Speech Recognition (ASR) error correction presents
sig...
Conversational Question Answering (CQA) is a challenging task that aims ...
Federated Learning (FL) has been widely concerned for it enables
decentr...
Image inpainting for completing complicated semantic environments and di...
There has been significant progress in emotional Text-To-Speech (TTS)
sy...
Typically, the Time-Delay Neural Network (TDNN) and Transformer can serv...
In recent Text-to-Speech (TTS) systems, a neural vocoder often generates...
Text summarization is essential for information aggregation and demands ...
Out-of-distribution (OOD) detection aims at enhancing standard deep neur...
Value-decomposition methods, which reduce the difficulty of a multi-agen...
Deep neural retrieval models have amply demonstrated their power but
est...
Deep neural networks have achieved remarkable performance in retrieval-b...
Because of predicting all the target tokens in parallel, the
non-autoreg...
Recent expressive text to speech (TTS) models focus on synthesizing emot...
Music genre classification has been widely studied in past few years for...
Data-Free Knowledge Distillation (DFKD) has recently attracted growing
a...
We present a novel method for enabling humanoid robots to learn a wide r...
This paper addresses the problem of enabling a robot to search for a sem...
This paper introduces a novel and general method to address the problem ...
Transformer-based approaches have been successfully proposed for 3D huma...
Lymph node (LN) metastasis status is one of the most critical prognostic...
The recent emergence of joint CTC-Attention model shows significant
impr...
Recent advances in pre-trained language models have improved the perform...
Most previous neural text-to-speech (TTS) methods are mainly based on
su...
Metaverse expands the physical world to a new dimension, and the physica...
Recovering the masked speech frames is widely applied in speech
represen...
In this paper, we proposed Adapitch, a multi-speaker TTS method that mak...
Estimating age from a single speech is a classic and challenging topic.
...
Unsupervised representation learning for speech audios attained impressi...
Since the beginning of the COVID-19 pandemic, remote conferencing and
sc...
Pose Guided Human Image Synthesis (PGHIS) is a challenging task of
trans...
Visual-Semantic Embedding (VSE) aims to learn an embedding space where
r...
Machine learning models (mainly neural networks) are used more and more ...
The extraction of sequence patterns from a collection of functionally li...
The Transformer architecture model, based on self-attention and multi-he...
Buddhism is an influential religion with a long-standing history and pro...
Nonparallel multi-domain voice conversion methods such as the StarGAN-VC...
Automatically measuring lesion/tumor size with RECIST (Response Evaluati...
Non-parallel many-to-many voice conversion remains an interesting but
ch...