The duality of content and style is inherent to the nature of art. For
h...
Interpreting and explaining the behavior of deep neural networks is crit...
Image captioning models are known to perpetuate and amplify harmful soci...
The increasing tendency to collect large and uncurated datasets to train...
Human evaluation is critical for validating the performance of text-to-i...
Vision Transformers (ViTs) are becoming a very popular paradigm for visi...
Video summarization aims to select the most informative subset of frames...
Body language such as conversational gesture is a powerful way to ease
c...
Is more data always better to train vision-and-language models? We study...
Vision-and-language tasks have increasingly drawn more attention as a me...
Evaluation measures have a crucial impact on the direction of research.
...
We study societal bias amplification in image captioning. Image captioni...
Mean Average Precision (mAP) is the primary evaluation measure for objec...
Video question answering (VideoQA) is designed to answer a given questio...
Have you ever looked at a painting and wondered what is the story behind...
Buddha statues are a part of human culture, especially of the Asia area,...
We propose a new 2D pose refinement network that learns to predict the h...
How far can we go with textual representations for understanding picture...
The rise of digitization of cultural documents offers large-scale conten...
Deep learning is a rapidly-evolving technology with possibility to
signi...
Visual Question Answering (VQA) is of tremendous interest to the researc...
Few-shot learning (FSL) approaches are usually based on an assumption th...
The status of retinal arteriovenous crossing is of great significance fo...
Semantic video segmentation is a key challenge for various applications....
A visual relationship denotes a relationship between two objects in an i...
Computational art analysis has, through its reliance on classification t...
Explainable artificial intelligence is gaining attention. However, most
...
The query-based moment retrieval is a problem of localising a specific c...
Answering questions related to art pieces (paintings) is a difficult tas...
Conventional 3D convolutional neural networks (CNNs) are computationally...
To understand movies, humans constantly reason over the dialogues and ac...
Retinal imaging serves as a valuable tool for diagnosis of various disea...
Human pose estimation is a well-known problem in computer vision to loca...
We propose a novel video understanding task by fusing knowledge-based an...
Retinal vessel segmentation is of great interest for diagnosis of retina...
We propose a novel video understanding task by fusing knowledge-based an...
We introduce BUDA.ART, a system designed to assist researchers in Art
Hi...
While Buddhism has spread along the Silk Roads, many pieces of art have ...
In computer vision, visual arts are often studied from a purely aestheti...
Automatic art analysis aims to classify and retrieve artistic representa...
Video summarization is a technique to create a short skim of the origina...
Reconstruction of the shape and motion of humans from RGB-D is a challen...
A paraphrase is a restatement of the meaning of a text in other words.
P...
Automatically generating a summary of sports video poses the challenge o...
This paper presents a video summarization technique for an Internet vide...
Our objective is video retrieval based on natural language queries. In
a...