Denoising Diffusion Probabilistic Models (DDPMs) have recently achieved
...
We present an innovative approach to 3D Human Pose Estimation (3D-HPE) b...
Large Language Models (LLMs) have made the ambitious quest for generalis...
Self-supervised learning can be used for mitigating the greedy needs of
...
Large-scale text-to-image diffusion models have significantly improved t...
Large multimodal models trained on natural documents, which interleave i...
Motion forecasting plays a critical role in enabling robots to anticipat...
Despite advances in Visual Question Answering (VQA), the ability of mode...
Foundation models are first pre-trained on vast unsupervised datasets an...
Large Language Models (LLMs) have so far impressed the world, with
unpre...
Deep neural networks (DNNs) are nowadays ubiquitous in many domains such...
Foundation models are redefining how AI systems are built. Practitioners...
We introduce submodel co-training, a regularization method related to
co...
Vision-Language Pretraining (VLP) and Foundation models have been the go...
Continual learning for segmentation has recently seen increasing interes...
Nowadays, deep vision models are being widely deployed in safety-critica...
Image generation has recently seen tremendous advances, with diffusion m...
Vision and Language Pretraining has become the prevalent approach for
ta...
The leap in performance in state-of-the-art computer vision methods is
a...
Recent works in autonomous driving have widely adopted the bird's-eye-vi...
Transformers have been matching deep convolutional networks for vision
a...
Deep architecture have proven capable of solving many tasks provided a
s...
Multi-input multi-output architectures propose to train multiple subnetw...
Standard neural networks struggle to generalize under distribution shift...
Unsupervised Domain Adaptation (UDA) is a transfer learning task which a...
Cross-modal image-recipe retrieval has gained significant attention in r...
A Vision Transformer (ViT) is a simple neural architecture amenable to s...
Deep neural networks (DNNs) are nowadays ubiquitous in the computer visi...
Computationally expensive neural networks are ubiquitous in computer vis...
After their initial success in natural language processing, transformer
...
Deep generative models, like GANs, have considerably improved the state ...
We show how to augment any convolutional network with an attention-based...
With the rapid advances in generative adversarial networks (GANs), the v...
Latent text representations exhibit geometric regularities, such as the
...
Deep network architectures struggle to continually learn new tasks witho...
As deep learning models are increasingly used in safety-critical
applica...
We describe a novel attribution method which is grounded in Sensitivity
...
Pruning Deep Neural Networks (DNNs) is a prominent field of study in the...
Learning-based trajectory prediction models have encountered great succe...
Vision-based depth estimation is a key feature in autonomous systems, wh...
Learning robust models that generalize well under changes in the data
di...
In this work, we address the task of unsupervised domain adaptation (UDA...
Deep learning approaches are nowadays ubiquitously used to tackle comput...
Despite the recent progress of generative adversarial networks (GANs) at...
Deep Neural Networks (DNNs) are ubiquitous in today's computer vision
la...
We present ResMLP, an architecture built entirely upon multi-layer
perce...
We introduce an evaluation methodology for visual question answering (VQ...
Transformers have been recently adapted for large scale image classifica...
Recent strategies achieved ensembling "for free" by fitting concurrently...
Deep ensembles perform better than a single network thanks to the divers...