Recent contrastive language image pre-training has led to learning highl...
Recent advances in zero-shot image recognition suggest that vision-langu...
In this paper, we propose self-supervised training for video transformer...
Vision transformers (ViTs) process input images as sequences of patches ...
Deep neural networks have achieved remarkable performance on a range of
...
Although deep learning has achieved appealing results on several machine...
Tracking multiple objects in real time is essential for a variety of
rea...
We tackle the panoptic segmentation problem with a conditional random fi...
Activity recognition in videos in a deep-learning setting---or
otherwise...