Visual question answering (VQA) is a critical multimodal task in which a...
Unsupervised domain adaption (UDA) is a transfer learning task where the...
Image-text matching is an important multi-modal task with massive
applic...
Recently, several approaches have been proposed to solve language genera...
Image description generation plays an important role in many real-world
...
Person re-identification (re-ID) aims to recognize a person-of-interest
...
Automatically generating the descriptions of an image, i.e., image
capti...
Recurrent Neural Networks (RNNs) have been widely used in natural langua...
Recently, the soft attention mechanism, which was originally proposed in...