We introduce Three Towers (3T), a flexible method to improve the contras...
Scaling laws have been recently employed to derive compute-optimal model...
There has been a recent explosion of computer vision models which perfor...
We propose a simple pairwise sigmoid loss for image-text pre-training. U...
Misalignment between model predictions and intended usage can be detrime...
Vision Transformers convert images to sequences by slicing them into pat...
While deep learning models have replaced hand-designed features across m...
Effective scaling and a flexible task interface enable large language mo...
We introduce UViM, a unified approach capable of modeling a wide range o...
It is commonly accepted that the Vision Transformer model requires
sophi...
This work presents a simple vision transformer design as a strong baseli...
This paper presents contrastive-tuning, a simple method employing contra...
Model efficiency is a critical aspect of developing and deploying machin...
Vision Transformers (ViT) have been shown to attain highly competitive
p...
There is a growing discrepancy in computer vision between large-scale mo...
Attention-based neural networks such as the Vision Transformer (ViT) hav...
Convolutional Neural Networks (CNNs) are the go-to model for computer vi...
Before deploying machine learning models it is critical to assess their
...
While the Transformer architecture has become the de-facto standard for
...
Modern deep convolutional networks (CNNs) are often criticized for not
g...
Yes, and no. We ask whether recent progress on the ImageNet classificati...
Transfer of pre-trained representations improves sample efficiency and
s...
An agent learning through interactions should balance its action selecti...
We propose a learning algorithm capable of learning from label proportio...
This work tackles the problem of semi-supervised learning of image
class...
Unsupervised visual representation learning remains a largely unsolved
p...
In the past decade many robots were deployed in the wild, and people
det...
Detecting humans is a key skill for mobile robots and intelligent vehicl...
Recent progress in Reinforcement Learning (RL), fueled by its combinatio...
With the rise of end-to-end learning through deep learning, person detec...
In the past few years, the field of computer vision has gone through a
r...
We introduce the DROW detector, a deep learning based detector for 2D ra...