We report a controlled study investigating the effect of visual informat...
While state-of-the-art NLP models have demonstrated excellent performanc...
Large pre-trained language models such as BERT have been widely used as ...
Previous work has established that RNNs with an unbounded activation fun...
Providing explanations for visual question answering (VQA) has gained mu...
In this study, we investigate the generalization of LSTM, ReLU and GRU m...
In this work, we focus on improving the captions generated by image-capt...
Recent advances in fake news detection have exploited the success of
lar...
Numerical reasoning based machine reading comprehension is a task that
i...
We present BERTGEN, a novel generative, decoder-only model which extends...
In this paper we present a controlled study on the linearized IRM framew...
Reasoning about information from multiple parts of a passage to derive a...
This paper introduces a large-scale multimodal and multilingual dataset ...
This paper addresses the problem of simultaneous machine translation (Si...
Pre-trained language models have been shown to improve performance in ma...
Automatic generation of video descriptions in natural language, also cal...
Automatic evaluation of language generation systems is a well-studied pr...
Simultaneous machine translation (SiMT) aims to translate a continuous i...
Public opinion influences events, especially related to stock market
mov...
This paper describes the Imperial College London team's submission to th...
In this paper, we focus on quantifying model stability as a function of
...
Recent literature shows that large-scale language modeling provides exce...
We address the task of text translation on the How2 dataset using a stat...
We address the task of evaluating image description generation systems. ...
Previous work on multimodal machine translation has shown that visual
in...
Explaining and interpreting the decisions of recommender systems are bec...
Current work on multimodal machine translation (MMT) has suggested that ...
One third of stroke survivors have language difficulties. Emerging evide...
An increasing number of datasets contain multiple views, such as video, ...
We hypothesize that end-to-end neural image captioning systems work seem...
We address the task of detecting foiled image captions, i.e. identifying...
The use of explicit object detectors as an intermediate step to image
ca...