Vision and Language (VL) models offer an effective method for aligning
r...
Large-scale pre-trained Vision Language (VL) models have shown remar...
Computer vision models suffer from a phenomenon known as catastrophic
fo...
Generalized Zero-Shot Learning (GZSL) aims to train a classifier that ca...
Recently, large-scale pre-trained Vision-and-Language (VL) foundation mo...
Existing work on VQA explores data augmentation to achieve better
genera...
Convolutional neural networks for visual recognition require large amoun...
Semi-supervised learning aims to take advantage of a large amount of
unl...
This paper explores the task of interactive image retrieval using natura...
Film media is a rich form of artistic expression. Unlike photography, an...