Contrastive learning allows us to flexibly define powerful losses by
con...
Many real-world video-text tasks involve different levels of granularity...
In this paper, we present a network architecture for video generation th...
Foreseeing the future is one of the key factors of intelligence. It invo...
The state of the art in video understanding suffers from two problems: (...
General human action recognition requires understanding of various visua...
CNN-based optical flow estimation has attracted attention recently, main...
Recent work has shown good recognition results in 3D object recognition ...