University of Amsterdam and Renmin University at TRECVID 2017: Searching Video, Detecting Events and Describing Video
In this paper, we summarize our TRECVID 2017 video recognition and retrieval experiments. We participated in three tasks: video search, event detection and video description. For both video search and event detection, we explore semantic representations based on VideoStory and an ImageNet Shuffle, which thrive well in few-example regimes. For the video description task, we experiment with a deep network that predicts a visual representation from a natural language description with Word2VisualVec, and use this space for the sentence matching. For generative description we enhance a neural image captioning model with Early Embedding and Late Reranking. The 2017 edition of the TRECVID benchmark has been a fruitful participation for our joint-team, resulting in the best overall result for video search and event detection as well as the runner-up position for video description.
READ FULL TEXT