Multimodal processing has attracted much attention lately especially wit...
Dense video captioning aims to generate corresponding text descriptions ...
Contour trees have been developed to visualize or encode scalar data in
...
Image retrieval with hybrid-modality queries, also known as composing te...
Translating e-commercial product descriptions, a.k.a product-oriented ma...
Entities Object Localization (EOL) aims to evaluate how grounded or fait...
Video paragraph captioning aims to describe multiple events in untrimmed...
Detecting meaningful events in an untrimmed video is essential for dense...
This notebook paper presents our model in the VATEX video captioning
cha...
Generating image descriptions in different languages is essential to sat...
Contextual reasoning is essential to understand events in long untrimmed...
This notebook paper presents our system in the ActivityNet Dense Caption...