Open-vocabulary semantic segmentation is a challenging task that require...
The goal of the audio-visual segmentation (AVS) task is to segment the
s...
In semantic segmentation, adapting a visual system to novel object categ...
The objective of Audio-Visual Segmentation (AVS) is to localise the soun...
In this paper, we consider the problem of temporal action localization u...
Learning from a large corpus of data, pre-trained models have achieved
i...
Temporal sentence grounding aims to detect the event timestamps describe...
Weakly-supervised temporal action localization (WTAL) learns to detect a...
We present a simple yet effective self-supervised framework for audio-vi...
Visual-language pre-training has shown great success for learning joint
...
Weakly-supervised temporal action localization aims to localize actions ...
Point-Level temporal action localization (PTAL) aims to localize actions...
Recently, temporal action localization (TAL), i.e., finding specific act...