Textually Enriched Neural Module Networks for Visual Question Answering

09/23/2018
by   Khyathi Raghavi Chandu, et al.
12

Problems at the intersection of language and vision, like visual question answering, have recently been gaining a lot of attention in the field of multi-modal machine learning as computer vision research moves beyond traditional recognition tasks. There has been recent success in visual question answering using deep neural network models which use the linguistic structure of the questions to dynamically instantiate network layouts. In the process of converting the question to a network layout, the question is simplified, which results in loss of information in the model. In this paper, we enrich the image information with textual data using image captions and external knowledge bases to generate more coherent answers. We achieve 57.1 test-dev open-ended questions from the visual question answering (VQA 1.0) real image dataset.

READ FULL TEXT

page 5

page 8

research
04/23/2020

Visual Question Answering Using Semantic Information from Image Descriptions

Visual question answering (VQA) is a task that requires AI systems to di...
research
11/09/2015

Neural Module Networks

Visual question answering is fundamentally compositional in nature---a q...
research
07/25/2022

Is GPT-3 all you need for Visual Question Answering in Cultural Heritage?

The use of Deep Learning and Computer Vision in the Cultural Heritage do...
research
10/03/2018

Transfer Learning via Unsupervised Task Discovery for Visual Question Answering

We study how to leverage off-the-shelf visual and linguistic data to cop...
research
07/12/2020

Applying recent advances in Visual Question Answering to Record Linkage

Multi-modal Record Linkage is the process of matching multi-modal record...
research
12/01/2020

Open-Ended Multi-Modal Relational Reason for Video Question Answering

People with visual impairments urgently need helps, not only on the basi...
research
06/14/2019

Improving Visual Question Answering by Referring to Generated Paragraph Captions

Paragraph-style image captions describe diverse aspects of an image as o...

Please sign up or login with your details

Forgot password? Click here to reset