Situation Recognition is the task of generating a structured summary of ...
While VideoQA Transformer models demonstrate competitive performance on
...
Humans have the natural ability to recognize actions even if the objects...
Coreference resolution aims at identifying words and phrases which refer...
Human-object interaction is one of the most important visual cues that h...
Abductive reasoning aims to make the most likely inference for a given s...
The problem of anticipating human actions is an inherently uncertain one...
Collection of real world annotations for training semantic segmentation
...
In recent years, neural implicit representations have made remarkable
pr...
We propose LocFormer, a Transformer-based model for video grounding whic...
Scene graph generation (SGG) aims to capture a wide variety of interacti...
Attention modules for Convolutional Neural Networks (CNNs) are an effect...
We present a new architecture for human action forecasting from videos. ...
We propose a framework for early action recognition and anticipation by
...
The use of latent variable models has shown to be a powerful tool for
mo...
Capsule networks (CapsNets) have recently shown promise to excel in most...
This paper studies the task of temporal moment localization in a long
un...
This paper presents a framework to recognize temporal compositions of at...
Future human action forecasting from partial observations of activities ...
Automatically generating natural language descriptions from an image is ...
We introduce a novel Recurrent Neural Network-based algorithm for future...
We introduce a novel Recurrent Neural Network-based algorithm for future...
This paper classifies human action sequences from videos using a machine...
Detecting temporal extents of human actions in videos is a challenging
c...
Action anticipation is critical in scenarios where one needs to react be...
Human action-anticipation methods predict what is the future action by
o...
The world is fundamentally compositional, so it is natural to think of v...
In this work, we present novel temporal encoding methods for action and
...
We present a principled approach to uncover the structure of visual data...
Most popular deep models for action recognition split video sequences in...
In contrast to the widely studied problem of recognizing an action given...
Purpose To develop a computer based method for the automated assessment ...
We introduce the concept of "dynamic image", a novel compact representat...
Existing image captioning models do not generalize well to out-of-domain...
We propose a new task of unsupervised action detection by action matchin...
We propose a new self-supervised CNN pre-training technique based on a n...
This paper introduces an extension of the backpropagation algorithm that...
Action recognition and anticipation are key to the success of many compu...
There is considerable interest in the task of automatically generating i...
We propose a function-based temporal pooling method that captures the la...
We present a supervised learning to rank algorithm that effectively orde...
In this work we focus on the problem of image caption generation. We pro...
This paper presents a novel multi scale gradient and a corner point base...
Domain adaptation aims at adapting the knowledge acquired on a source do...
Would it be possible to automatically associate ancient pictures to mode...
In this paper, we introduce a new domain adaptation (DA) algorithm where...