Large-scale vision-language models (VLM) have shown impressive results f...
We propose a method to recommend music for an input video while allowing...
The sound effects that designers add to videos are designed to convey a
...
We propose a self-supervised approach for learning to perform audio sour...
We study the recent progress on dynamic view synthesis (DVS) from monocu...
We present an approach for recommending a music track for a given video,...
We introduce an approach for selecting objects in neural volumetric 3D
r...
We introduce FocalPose, a neural render-and-compare method for jointly
e...
Training supervised image synthesis models requires a critic to compare ...
We introduce the task of spatially localizing narrated interactions in
v...
We introduce the task of weakly supervised learning for detecting human ...
A neural radiance field (NeRF) is a scene model supporting high-quality ...
Existing deep models predict 2D and 3D kinematic poses from video that a...
Self-supervised audio-visual learning aims to capture useful representat...
Self-supervised audio-visual learning aims to capture useful representat...
Estimating 3D hand pose from single RGB images is a highly ambiguous pro...
We introduce a method to generate videos of dynamic virtual objects plau...
In this paper, we introduce the task of retrieving relevant video moment...
We introduce an approach to model surface properties governing bounces i...
In video production, inserting B-roll is a widely used technique to enri...
Localizing moments in a longer video via natural language queries is a n...
Human shape estimation is an important task for video editing, animation...
Knowing where people look and click on visual designs can provide clues ...
We consider retrieving a specific temporal segment, or moment, from a vi...
In this work, we introduce a new video representation for action
classif...
We explore design principles for general pixel-level prediction problems...
We explore architectures for general pixel-level prediction problems, fr...
We introduce an approach that leverages surface normal predictions, alon...
This paper presents an end-to-end convolutional neural network (CNN) for...
We introduce an approach for analyzing the variation of features generat...