Weakly-supervised audio-visual violence detection aims to distinguish
sn...
Vision-Language Pre-training (VLP) with large-scale image-text pairs has...
Although audio-visual representation has been proved to be applicable in...
Visual-only self-supervised learning has achieved significant improvemen...
Recognizing and localizing events in videos is a fundamental task for vi...
We present a summary of the domain adaptive cascade R-CNN method for mit...
Audio-visual event localization aims to localize an event that is both
a...
When watching videos, the occurrence of a visual event is often accompan...
This study discusses an alternative tool for modeling student assessment...
Contemporary scientific exchanges are international, yet language contin...