Autonomous driving systems generally employ separate models for differen...
Stable diffusion, a generative model used in text-to-image synthesis,
fr...
Cross-modal pre-training has shown impressive performance on a wide rang...
Recently, large-scale diffusion models, e.g., Stable diffusion and DallE...
Given a natural language, a general robot has to comprehend the instruct...
Recent advances in text-to-image diffusion models have achieved remarkab...
In recent years, the field of computer vision has seen significant
advan...
This paper presents DetCLIPv2, an efficient and scalable training framew...
Contrastive Language-Image Pre-training, benefiting from large-scale
unl...
Existing open-world universal segmentation approaches usually leverage C...
Benefiting from large-scale vision-language pre-training on image-text p...
Multi-task learning has emerged as a powerful paradigm to solve a range ...
Large-scale cross-modal pre-training paradigms have recently shown ubiqu...
Inspired by the success of visual-language methods (VLMs) in zero-shot
c...
Vision-language pre-training (VLP) has attracted increasing attention
re...
Open-world object detection, as a more general and challenging goal, aim...
Aiming towards a holistic understanding of multiple downstream tasks
sim...
To bridge the gap between supervised semantic segmentation and real-worl...
Self-supervised learning (SSL), especially contrastive methods, has rais...
We present ONCE-3DLanes, a real-world autonomous driving dataset with la...
We present Laneformer, a conceptually simple yet powerful transformer-ba...
Contemporary deep-learning object detection methods for autonomous drivi...
Aiming at facilitating a real-world, ever-evolving and scalable autonomo...