A reliable and comprehensive evaluation metric that aligns with manual
p...
We introduce a new conversation head generation benchmark for synthesizi...
Multimodal fusion integrates the complementary information present in
mu...
Dynamically synthesizing talking speech that actively responds to a list...
The gap between low-level visual signals and high-level semantics has be...
Vision Transformer (ViT) has become a leading tool in various computer v...
People naturally conduct spontaneous body motions to enhance their speec...
Responsive listening during face-to-face conversations is a critical ele...
Only a few cherry-picked robust augmentation policies are beneficial to
...
Data augmentation is practically helpful for visual recognition, especia...
The significant progress on Generative Adversarial Networks (GANs) has
f...
With the rapid development of electronic commerce, the way of shopping h...
Most object recognition approaches predominantly focus on learning
discr...
The significant progress on Generative Adversarial Networks (GANs) have ...
Relationships, as the bond of isolated entities in images, reflect the
i...
Large scale image dataset and deep convolutional neural network (DCNN) a...
Convolutional Neural Networks (CNNs) have achieved comparable error rate...
Image retrieval refers to finding relevant images from an image database...