Learning to Attend Relevant Regions in Videos from Eye Fixations
Attentively important objects in videos account for a majority part of semantics in a current frame. Information about human attention might be useful not only for entertainment (such as auto generating commentary and tourist guide) but also for robotic control which holds a larascope supported for laparoscopic surgery. In this work, we address the problem of attending relevant objects in videos conditioned on eye fixations using RNN-based visual attention model. To the best of our knowledge, this is the first work to approach the problem from RNNs.
READ FULL TEXT