Carl Doersch

research

∙ 08/30/2023

RoboTAP: Tracking Arbitrary Points for Few-Shot Visual Imitation

For robots to be useful outside labs and specialized factories we need a...

0 Mel Večerík, et al. ∙

research

∙ 06/14/2023

TAPIR: Tracking Any Point with per-frame Initialization and temporal Refinement

We present a novel model for Tracking Any Point (TAP) that effectively t...

0 Carl Doersch, et al. ∙

research

∙ 05/23/2023

Perception Test: A Diagnostic Benchmark for Multimodal Video Models

We propose a novel multimodal video benchmark - the Perception Test - to...

0 Viorica Patraucean, et al. ∙

research

∙ 11/07/2022

TAP-Vid: A Benchmark for Tracking Any Point in a Video

Generic motion understanding from video involves not only tracking objec...

0 Carl Doersch, et al. ∙

research

∙ 12/06/2021

Input-level Inductive Biases for 3D Reconstruction

Much of the recent progress in 3D vision has been driven by the developm...

3 Wang Yifan, et al. ∙

research

∙ 07/30/2021

Perceiver IO: A General Architecture for Structured Inputs Outputs

The recently-proposed Perceiver model obtains good results on several do...

6 Andrew Jaegle, et al. ∙

research

∙ 06/26/2021

Inferring a Continuous Distribution of Atom Coordinates from Cryo-EM Images using VAEs

Cryo-electron microscopy (cryo-EM) has revolutionized experimental prote...

0 Dan Rosenbaum, et al. ∙

research

∙ 07/22/2020

CrossTransformers: spatially-aware few-shot transfer

Given new tasks with very little data–such as new classes in a classific...

1 Carl Doersch, et al. ∙

research

∙ 06/13/2020

Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning

We introduce Bootstrap Your Own Latent (BYOL), a new approach to self-su...

0 Jean-Bastien Grill, et al. ∙

research

∙ 07/04/2019

Sim2real transfer learning for 3D pose estimation: motion to the rescue

Simulation is an anonymous, low-bias source of data where annotation can...

1 Carl Doersch, et al. ∙

research

∙ 05/22/2019

Data-Efficient Image Recognition with Contrastive Predictive Coding

Large scale deep learning excels when labeled images are abundant, yet d...

3 Olivier J. Hénaff, et al. ∙

research

∙ 05/10/2019

Exploiting temporal context for 3D human pose estimation in the wild

We present a bundle-adjustment-based algorithm for recovering accurate 3...

8 Anurag Arnab, et al. ∙

research

∙ 04/05/2019

Structured agents for physical construction

Physical construction -- the ability to compose objects, subject to phys...

20 Victor Bapst, et al. ∙

research

∙ 12/06/2018

Video Action Transformer Network

We introduce the Action Transformer model for recognizing and localizing...

20 Rohit Girdhar, et al. ∙

research

∙ 09/11/2018

The Visual QA Devil in the Details: The Impact of Early Fusion and Batch Norm on CLEVR

Visual QA is a pivotal challenge for higher-level reasoning, requiring u...

0 Mateusz Malinowski, et al. ∙

research

∙ 08/01/2018

Learning Visual Question Answering by Bootstrapping Hard Attention

Attention mechanisms in biological perception are thought to select subs...

6 Mateusz Malinowski, et al. ∙

research

∙ 07/26/2018

A Better Baseline for AVA

We introduce a simple baseline for action localization on the AVA datase...

0 Rohit Girdhar, et al. ∙

research

∙ 03/10/2018

Kickstarting Deep Reinforcement Learning

We present a method for using previously-trained 'teacher' agents to kic...

0 Simon Schmitt, et al. ∙

research

∙ 08/25/2017

Multi-task Self-Supervised Visual Learning

We investigate methods for combining multiple self-supervised tasks--i.e...

0 Carl Doersch, et al. ∙

research

∙ 06/25/2016

An Uncertain Future: Forecasting from Static Images using Variational Autoencoders

In a given scene, humans can often easily predict a set of immediate fut...

0 Jacob Walker, et al. ∙

research

∙ 06/19/2016

Tutorial on Variational Autoencoders

In just three years, Variational Autoencoders (VAEs) have emerged as one...

0 Carl Doersch, et al. ∙

research

∙ 04/27/2015

Mid-level Elements for Object Detection

Building on the success of recent discriminative mid-level elements, we ...

0 Aayush Bansal, et al. ∙

Carl Doersch

Featured Co-authors

Sign in with Google

Consider DeepAI Pro