Transformers have shown superior performance on various vision tasks. Th...
In recent years, Text-to-Image (T2I) models have been extensively studie...
Most pre-trained learning systems are known to suffer from bias, which
t...
Bird's-Eye View (BEV) features are popular intermediate scene representa...
Despite the recent impressive breakthroughs in text-to-image generation,...
Pre-trained language models (LMs) have shown remarkable reasoning perfor...
Recent methods in self-supervised learning have demonstrated that
maskin...
World models in model-based reinforcement learning usually face unrealis...
In this paper, we develop a general framework based on the Transformer
a...
Controllable text generation concerns two fundamental tasks of wide
appl...
Transformers have recently shown superior performances on various vision...
Masked language models have revolutionized natural language processing
s...
Learning disentangled representations leads to interpretable models and
...
Feature learning for 3D object detection from point clouds is very
chall...
We present an approach with a novel differentiable flow-to-depth layer f...
We present Uncertainty-aware Cascaded Stereo Network (UCS-Net) for 3D
re...
Recently, autonomous driving development ignited competition among car m...
In spite of achieving revolutionary successes in machine learning, deep
...