Vision-language models (VLMs) have recently demonstrated strong efficacy...
The problem of community-level information pathway prediction (CLIPP) ai...
The success of deep learning has enabled advances in multimodal tasks th...
We focus on Multimodal Machine Reading Comprehension (M3C) where a model...
This paper targets the problem of procedural multimodal machine comprehe...
This paper proposes a new approach to detecting neural Trojans on Deep N...
We target the problem of detecting Trojans or backdoors in DNNs. Such mo...
We improve zero-shot learning (ZSL) by incorporating common-sense knowle...
We study an important, yet largely unexplored problem of large-scale
cro...
We introduce Deep Adaptive Semantic Logic (DASL), a novel framework for
...
While models for Visual Question Answering (VQA) have steadily improved ...
Food classification is a challenging problem due to the large number of
...
There has been an explosion of multimodal content generated on social me...
Computing author intent from multimodal data like Instagram posts requir...
We address the problem of grounding free-form textual phrases by using w...
We present a novel method for fusing appearance and semantic information...
We tackle the problem of understanding visual ads where given an ad imag...
We introduce and tackle the problem of zero-shot object detection (ZSD),...
Food classification from images is a fine-grained classification problem...
We propose a novel method for temporally pooling frames in a video for t...
We study the problem of video classification for facial analysis and hum...
We study the problem of facial analysis in videos. We propose a novel we...
An active object recognition system has the advantage of being able to a...
In recent years, Printed Circuit Boards (PCB) have become the backbone o...