Adria Recasens

research

∙ 01/23/2023

Zorro: the masked multimodal transformer

Attention-based models are appealing for multimodal processing because i...

17 Adria Recasens, et al. ∙

research

∙ 11/07/2022

TAP-Vid: A Benchmark for Tracking Any Point in a Video

Generic motion understanding from video involves not only tracking objec...

0 Carl Doersch, et al. ∙

research

∙ 02/22/2022

Hierarchical Perceiver

General perception systems such as Perceivers can process arbitrary moda...

2 Joao Carreira, et al. ∙

research

∙ 11/23/2021

Towards Learning Universal Audio Representations

The ability to learn universal audio representations that can solve dive...

0 Luyu Wang, et al. ∙

research

∙ 04/26/2021

Multimodal Self-Supervised Learning of General Audio Representations

We present a multimodal framework to learn general audio representations...

0 Luyu Wang, et al. ∙

research

∙ 03/30/2021

Broaden Your Views for Self-Supervised Video Learning

Most successful self-supervised learning methods are trained to align th...

3 Adria Recasens, et al. ∙

research

∙ 02/09/2021

A Deep Learning Approach for Characterizing Major Galaxy Mergers

Fine-grained estimation of galaxy merger stages from observations is a k...

0 Skanda Koppula, et al. ∙

research

∙ 11/18/2020

Game Plan: What AI can do for Football, and What Football can do for AI

The rapid progress in artificial intelligence (AI) and machine learning ...

11 Karl Tuyls, et al. ∙

research

∙ 06/29/2020

Self-Supervised MultiModal Versatile Networks

Videos are a rich source of multi-modal supervision. In this work, we le...

82 Jean-Baptiste Alayrac, et al. ∙

research

∙ 03/30/2020

Context Based Emotion Recognition using EMOTIC Dataset

In our everyday lives and social interactions we often try to perceive t...

2 Ronak Kosti, et al. ∙

research

∙ 10/22/2019

Gaze360: Physically Unconstrained Gaze Estimation in the Wild

Understanding where people are looking is an informative social cue. In ...

18 Petr Kellnhofer, et al. ∙

research

∙ 09/10/2018

Learning to Zoom: a Saliency-Based Sampling Layer for Neural Networks

We introduce a saliency-based distortion layer for convolutional neural ...

8 Adria Recasens, et al. ∙

research

∙ 07/27/2018

Synthetically Trained Icon Proposals for Parsing and Summarizing Infographics

Widely used in news, business, and educational media, infographics are h...

2 Spandan Madan, et al. ∙

research

∙ 04/04/2018

Jointly Discovering Visual Objects and Spoken Words from Raw Sensory Input

In this paper, we explore neural network models that learn to associate ...

2 David Harwath, et al. ∙

research

∙ 09/26/2017

Understanding Infographics through Textual and Visual Tag Prediction

We introduce the problem of visual hashtag discovery for infographics: e...

0 Zoya Bylinskii, et al. ∙

research

∙ 12/09/2016

Following Gaze Across Views

Following the gaze of people inside videos is an important signal for un...

0 Adria Recasens, et al. ∙

Adria Recasens

Featured Co-authors

Sign in with Google

Consider DeepAI Pro