To realize human-robot collaboration, robots need to execute actions for...
Spatio-temporal scene-graph approaches to video-based reasoning tasks su...
In previous work, we have proposed the Audio-Visual Scene-Aware Dialog (...
Video captioning is an essential technology to understand scenes and des...
This paper addresses end-to-end automatic speech recognition (ASR) for l...
In contrast with previous approaches where information flows only toward...
The Audio-Visual Scene-aware Dialog (AVSD) task requires an agent to ind...
Generating video descriptions automatically is a challenging task that
i...
This paper introduces the Eighth Dialog System Technology Challenge. In ...
We introduce the task of scene-aware dialog. Given a follow-up question ...
This paper introduces the Seventh Dialog System Technology Challenges (D...
Dialog systems need to understand dynamic visual scenes in order to have...
Scene-aware dialog systems will be able to have conversations with users...
End-to-end training of neural networks is a promising approach to automa...
Currently successful methods for video description are based on
encoder-...