Fabian Caba Heilbron

research

∙ 08/18/2023

Long-range Multimodal Pretraining for Movie Understanding

Learning computer vision models from (and for) movies has a long-standin...

0 Dawit Mureja Argaw, et al. ∙

research

∙ 06/16/2023

Meta-Personalizing Vision-Language Models to Find Named Instances in Video

Large-scale vision-language models (VLM) have shown impressive results f...

0 Chun-Hsiao Yeh, et al. ∙

research

∙ 02/26/2023

Localizing Moments in Long Video Via Multimodal Guidance

The recent introduction of the large-scale long-form MAD dataset for lan...

0 Wayner Barrios, et al. ∙

research

∙ 12/09/2022

PIVOT: Prompting for Video Continual Learning

Modern machine learning pipelines are limited due to data availability, ...

8 Andrés Villa, et al. ∙

research

∙ 11/22/2022

Videogenic: Video Highlights via Photogenic Moments

This paper investigates the challenge of extracting highlight moments fr...

0 David Chuan-En Lin, et al. ∙

research

∙ 11/22/2022

VideoMap: Video Editing in Latent Space

Video has become a dominant form of media. However, video editing interf...

0 David Chuan-En Lin, et al. ∙

research

∙ 07/20/2022

The Anatomy of Video Editing: A Dataset and Benchmark Suite for AI-Assisted Video Editing

Machine learning is transforming the video editing industry. Recent adva...

0 Dawit Mureja Argaw, et al. ∙

research

∙ 05/11/2022

Video-ReTime: Learning Temporally Varying Speediness for Time Remapping

We propose a method for generating a temporally remapped video that matc...

0 Simon Jenni, et al. ∙

research

∙ 03/24/2022

FitCLIP: Refining Large-Scale Pretrained Image-Text Models for Zero-Shot Video Understanding Tasks

Large-scale pretrained image-text models have shown incredible zero-shot...

2 Santiago Castro, et al. ∙

research

∙ 02/10/2022

OWL (Observe, Watch, Listen): Localizing Actions in Egocentric Video via Audiovisual Temporal Context

Temporal action localization (TAL) is an important task extensively expl...

7 Merey Ramazanova, et al. ∙

research

∙ 01/23/2022

vCLIMB: A Novel Video Class Incremental Learning Benchmark

Continual learning (CL) is under-explored in the video domain. The few e...

5 Andrés Villa, et al. ∙

research

∙ 12/01/2021

MAD: A Scalable Dataset for Language Grounding in Videos from Movie Audio Descriptions

The recent and increasing interest in video-language research has driven...

0 Mattia Soldan, et al. ∙

research

∙ 09/12/2021

MovieCuts: A New Dataset and Benchmark for Cut Type Recognition

Understanding movies and their structural patterns is a crucial task to ...

0 Alejandro Pardo, et al. ∙

research

∙ 08/09/2021

Learning to Cut by Watching Movies

Video content creation keeps growing at an incredible pace; yet, creatin...

9 Alejandro Pardo, et al. ∙

research

∙ 07/25/2021

Transcript to Video: Efficient Clip Sequencing from Texts

Among numerous videos shared on the web, well-edited ones always attract...

1 Yu Xiong, et al. ∙

research

∙ 06/03/2021

APES: Audiovisual Person Search in Untrimmed Video

Humans are arguably one of the most important subjects in video streams,...

0 Juan Leon Alcazar, et al. ∙

research

∙ 01/11/2021

MAAS: Multi-modal Assignation for Active Speaker Detection

Active speaker detection requires a solid integration of multi-modal cue...

0 Juan Leon Alcazar, et al. ∙

research

∙ 07/07/2020

Real-time Semantic Segmentation with Fast Attention

In deep CNN based models for semantic segmentation, high accuracy relies...

0 Ping Hu, et al. ∙

research

∙ 05/20/2020

Active Speakers in Context

Current methods for active speak er detection focus on modeling short-te...

0 Juan Leon Alcazar, et al. ∙

research

∙ 04/03/2020

Temporally Distributed Networks for Fast Video Segmentation

We present TDNet, a temporally distributed network designed for fast and...

13 Ping Hu, et al. ∙

research

∙ 04/03/2020

Temporally Distributed Networks for Fast Video Semantic Segmentation

We present TDNet, a temporally distributed network designed for fast and...

1 Ping Hu, et al. ∙

research

∙ 03/30/2019

RefineLoc: Iterative Refinement for Weakly-Supervised Action Localization

Video action detectors are usually trained using video datasets with ful...

0 Humam Alwassel, et al. ∙

research

∙ 08/11/2018

The ActivityNet Large-Scale Activity Recognition Challenge 2018 Summary

The 3rd annual installment of the ActivityNet Large- Scale Activity Reco...

0 Bernard Ghanem, et al. ∙

research

∙ 07/27/2018

Diagnosing Error in Temporal Action Detectors

Despite the recent progress in video understanding and the continuous ra...

0 Humam Alwassel, et al. ∙

research

∙ 10/22/2017

ActivityNet Challenge 2017 Summary

The ActivityNet Large Scale Activity Recognition Challenge 2017 Summary:...

0 Bernard Ghanem, et al. ∙

research

∙ 06/13/2017

Action Search: Learning to Search for Human Activities in Untrimmed Videos

Traditional approaches for action detection use trimmed data to learn so...

0 Humam Alwassel, et al. ∙

Fabian Caba Heilbron

Featured Co-authors

Sign in with Google

Consider DeepAI Pro