In recent years, image generation has shown a great leap in performance,...
Recent advances in text-to-image diffusion models have enabled the gener...
We seek to semantically describe a set of images, capturing both the
att...
We introduce a zero-shot video captioning method that employs two frozen...
It has been observed that visual classification models often rely mostly...
The success of deep neural nets heavily relies on their ability to encod...
Recent text-to-image matching models apply contrastive learning to large...
Machine learning advances in the last decade have relied significantly o...
We present a method for matching a text sentence from a given corpus to ...
We address the problem of visual storytelling, i.e., generating a story ...
Assessing an AI agent that can converse in human language and understand...
Many recent datasets contain a variety of different data modalities, for...
Dialog is an effective way to exchange information, but subtle details a...
The recently proposed audio-visual scene-aware dialog task paves the way...
The quest for algorithms that enable cognitive abilities is an important...