Point cloud-based open-vocabulary 3D object detection aims to detect 3D
...
Despite the stunning ability to generate high-quality images by recent
t...
The field of generative AI has a transformative impact on various areas,...
In this work, we propose SAM3D, a novel framework that is able to predic...
The ultimate goal for foundation models is realizing task-agnostic, i.e....
Photos serve as a way for humans to record what they experience in their...
Diffusion models have attained impressive visual quality for image synth...
We propose a simple, efficient, yet powerful framework for dense visual
...
Shape can specify key object constraints, yet existing text-to-image
dif...
As a pioneering work exploring transformer architecture for 3D point clo...
Test-time adaptation harnesses test inputs to improve the accuracy of a ...
We introduce ArtBench-10, the first class-balanced, high-quality, cleanl...
Dominant pre-training work for video-text retrieval mainly adopt the
"du...
Pre-training a model to learn transferable video-text representation for...
We propose a novel algorithm, named Open-Edit, which is the first attemp...
Semantic image synthesis aims at generating photorealistic images from
s...
Text-image cross-modal retrieval is a challenging task in the field of
l...
Referring expression grounding aims at locating certain objects or perso...
Pedestrian attribute recognition has attracted many attentions due to it...
Person re-identification is an important task that requires learning
dis...
The aim of image captioning is to generate similar captions by machine a...
Pedestrian analysis plays a vital role in intelligent video surveillance...
Object detection in videos has drawn increasing attention recently with ...