Learning computer vision models from (and for) movies has a long-standin...
Large-scale vision-language models (VLM) have shown impressive results f...
The recent introduction of the large-scale long-form MAD dataset for lan...
Modern machine learning pipelines are limited due to data availability,
...
This paper investigates the challenge of extracting highlight moments fr...
Video has become a dominant form of media. However, video editing interf...
Machine learning is transforming the video editing industry. Recent adva...
We propose a method for generating a temporally remapped video that matc...
Large-scale pretrained image-text models have shown incredible zero-shot...
Temporal action localization (TAL) is an important task extensively expl...
Continual learning (CL) is under-explored in the video domain. The few
e...
The recent and increasing interest in video-language research has driven...
Understanding movies and their structural patterns is a crucial task to
...
Video content creation keeps growing at an incredible pace; yet, creatin...
Among numerous videos shared on the web, well-edited ones always attract...
Humans are arguably one of the most important subjects in video streams,...
Active speaker detection requires a solid integration of multi-modal cue...
In deep CNN based models for semantic segmentation, high accuracy relies...
Current methods for active speak er detection focus on modeling short-te...
We present TDNet, a temporally distributed network designed for fast and...
We present TDNet, a temporally distributed network designed for fast and...
Video action detectors are usually trained using video datasets with ful...
The 3rd annual installment of the ActivityNet Large- Scale Activity
Reco...
Despite the recent progress in video understanding and the continuous ra...
The ActivityNet Large Scale Activity Recognition Challenge 2017 Summary:...
Traditional approaches for action detection use trimmed data to learn
so...