As it is empirically observed that Vision Transformers (ViTs) are quite
...
Weakly-supervised temporal action localization (WTAL) is a practical yet...
The crux of label-efficient semantic segmentation is to produce high-qua...
Masked image modeling (MIM) has attracted much research attention due to...
Few-Shot transfer learning has become a major focus of research as it al...
In this paper, we propose a new approach to applying point-level annotat...
Sound source localization in visual scenes aims to localize objects emit...
Weakly supervised semantic segmentation based on image-level labels aims...