Large-scale image-text contrastive pre-training models, such as CLIP, ha...
Video-Language Pre-training models have recently significantly improved
...
As an important area in computer vision, object tracking has formed two
...
In this paper, we study an intermediate form of supervision, i.e.,
singl...
In this paper, we study object detection using a large pool of unlabeled...