Contrastive language-image pre-training (CLIP) serves as a de-facto stan...
Masked autoencoders have become popular training paradigms for
self-supe...
The success of language Transformers is primarily attributed to the pret...
Computed Tomography (CT) plays an important role in monitoring
radiation...
Visual Object Tracking (VOT) can be seen as an extended task of Few-Shot...
Visual Object Tracking (VOT) has synchronous needs for both robustness a...
Learning cross-view consistent feature representation is the key for acc...
The problem of visual object tracking has traditionally been handled by
...