Inductive Attention for Video Action Anticipation

by   Tsung-Ming Tai, et al.

Anticipating future actions based on video observations is an important task in video understanding, which would be useful for some precautionary systems that require response time to react before an event occurs. Since the input in action anticipation is only pre-action frames, models do not have enough information about the target action; moreover, similar pre-action frames may lead to different futures. Consequently, any solution using existing action recognition models can only be suboptimal. Recently, researchers have proposed using a longer video context to remedy the insufficient information in pre-action intervals, as well as the self-attention to query past relevant moments to address the anticipation problem. However, the indirect use of video input features as the query might be inefficient, as it only serves as the proxy to the anticipation goal. To this end, we propose an inductive attention model, which transparently uses prior prediction as the query to derive the anticipation result by induction from past experience. Our method naturally considers the uncertainty of multiple futures via the many-to-many association. On the large-scale egocentric video datasets, our model not only shows consistently better performance than state of the art using the same backbone, and is competitive to the methods that employ a stronger backbone, but also superior efficiency in less model parameters.


page 7

page 12

page 14

page 15

page 16


Stacked Temporal Attention: Improving First-person Action Recognition by Emphasizing Discriminative Clips

First-person action recognition is a challenging task in video understan...

Action Recognition in Untrimmed Videos with Composite Self-Attention Two-Stream Framework

With the rapid development of deep learning algorithms, action recogniti...

MAiVAR: Multimodal Audio-Image and Video Action Recognizer

Currently, action recognition is predominately performed on video data a...

Unified Recurrence Modeling for Video Action Anticipation

Forecasting future events based on evidence of current conditions is an ...

Information Elevation Network for Fast Online Action Detection

Online action detection (OAD) is a task that receives video segments wit...

Karma: Adaptive Video Streaming via Causal Sequence Modeling

Optimal adaptive bitrate (ABR) decision depends on a comprehensive chara...

Towards Streaming Egocentric Action Anticipation

Egocentric action anticipation is the task of predicting the future acti...

Please sign up or login with your details

Forgot password? Click here to reset