The ubiquitous and demonstrably suboptimal choice of resizing images to ...
Open-vocabulary object detection has benefited greatly from pretrained
v...
Vision Transformers convert images to sequences by slicing them into pat...
Semantic segmentation labels are expensive and time consuming to acquire...
Scenic is an open-source JAX library with a focus on Transformer-based m...
Accurate estimation of predictive uncertainty (model calibration) is
ess...
While the Transformer architecture has become the de-facto standard for
...
Modern deep convolutional networks (CNNs) are often criticized for not
g...
In self-supervised visual representation learning, a feature extractor i...
Extracting and predicting object structure and dynamics from videos with...