We introduce the novel-view acoustic synthesis (NVAS) task: given the si...
Photorealistic avatars of human faces have come a long way in recent yea...
In this work, we present an end-to-end binaural speech synthesis system ...
We present a single-stage casual waveform-to-waveform multichannel model...
Since facial actions such as lip movements contain significant informati...
Neural face avatars that are trained from multi-view data captured in ca...
Speech enhancement is a critical component of many user-oriented audio
a...
Impulse response estimation in high noise and in-the-wild settings, with...
This paper presents a generic method for generating full facial 3D anima...
Codec Avatars are a recent class of learned, photorealistic face models ...
Action segmentation is the task of temporally segmenting every frame of ...
Action recognition has become a rapidly developing research field within...
Action recognition is so far mainly focusing on the problem of classific...
Video learning is an important task in computer vision and has experienc...
We propose a new paradigm for advancing 3D semantic scene completion bas...
Analyzing human actions in videos has gained increased attention recentl...
Action recognition is a fundamental problem in computer vision with a lo...
Action detection and temporal segmentation of actions in videos are topi...
We present an approach for weakly supervised learning of human actions. ...
The traditional bag-of-words approach has found a wide range of applicat...
We present an approach for weakly supervised learning of human actions f...