Recent advances in vision-language pre-training have enabled machines to...
Our goal in this research is to study a more realistic environment in wh...
Deep learning became the game changer for image retrieval soon after it ...
In recent years, with the increasing demand for public safety and the ra...
The misalignment of human images caused by pedestrian detection bounding...
In this paper, we aim to advance the research of multi-modal pre-trainin...
Exploring fine-grained relationship between entities(e.g. objects in ima...