This paper investigates the problem of inertial navigation system (INS)
...
Transfer learning has become crucial in computer vision tasks due to the...
Though the success of CLIP-based training recipes in vision-language mod...
We present SEED, an elaborate image tokenizer that empowers Large Langua...
This paper introduces DreamDiffusion, a novel method for generating
high...
Visual foundation models like CLIP excel in learning feature representat...
Stickers have become a ubiquitous part of modern-day communication, conv...
Public large-scale text-to-image diffusion models, such as Stable Diffus...
The ultimate goal for foundation models is realizing task-agnostic, i.e....
We empirically investigate proper pre-training methods to build good vis...
Foundation models have achieved great advances in multi-task learning wi...
Tags are pivotal in facilitating the effective distribution of multimedi...
The state of the arts in vision-language pretraining (VLP) achieves exem...
Large-scale embedding-based retrieval (EBR) is the cornerstone of
search...
Both masked image modeling (MIM) and natural language supervision have
f...
Though deep neural networks have achieved impressive success on various
...
To reproduce the success of text-to-image (T2I) generation, recent works...
Vector-Quantized (VQ-based) generative models usually consist of two bas...
The traditional model upgrading paradigm for retrieval requires recomput...
This paper addresses an important problem of ranking the pre-trained dee...
Since the development of self-supervised visual representation learning ...
The task of privacy-preserving model upgrades in image retrieval desires...
Dominant pre-training work for video-text retrieval mainly adopt the
"du...
We present an approach to efficiently and effectively adapt a masked ima...
Image BERT pre-training with masked image modeling (MIM) becomes a popul...
Conventional model upgrades for visual search systems require offline re...
Though remarkable progress has been achieved in various vision tasks, de...
The task of hot-refresh model upgrades of image retrieval systems plays ...
Pre-training a model to learn transferable video-text representation for...
Vision Transformer (ViT) and its variants (e.g., Swin, PVT) have achieve...
Conditional generative adversarial networks (cGANs) target at synthesizi...
Correspondence selection between two groups of feature points aims to
co...
In this technical report, we present our submission to the VisDA Challen...
The task of large-scale retrieval-based image localization is to estimat...
Domain adaptive object re-ID aims to transfer the learned knowledge from...
Unsupervised domain adaptation (UDA) aims at adapting the model trained ...
Person re-identification (re-ID) aims at identifying the same persons' i...
Person re-identification (reID) is an important task that requires to
re...