The ubiquitous and demonstrably suboptimal choice of resizing images to ...
Contrastive image-text models such as CLIP form the building blocks of m...
Understanding verbs is crucial to modelling how people and objects inter...
Vision Transformers convert images to sequences by slicing them into pat...
Pixel-level labels are particularly expensive to acquire. Hence, pretrai...
We study class-incremental learning, a training setup in which new class...
We propose Masked Siamese Networks (MSN), a self-supervised learning
fra...
Discriminative self-supervised learning allows training models on any ra...
Information retrieval is an important component in natural language
proc...
Following their success in natural language processing, transformers hav...
We present ResMLP, an architecture built entirely upon multi-layer
perce...
In this paper, we question if self-supervised learning provides new
prop...
This paper proposes a novel method of learning by predicting view assign...
Recently, self-supervised learning methods like MoCo, SimCLR, BYOL and S...
Unsupervised image representations have significantly reduced the gap wi...
Convolutional neural networks trained without supervision come close to
...
Pre-training general-purpose visual features with convolutional neural
n...
Clustering is a class of unsupervised learning methods that has been
ext...