Fully-test-time adaptation (F-TTA) can mitigate performance loss due to
...
We present Spatial LibriSpeech, a spatial audio dataset with over 650 ho...
Preference-based reinforcement learning (RL) algorithms help avoid the
p...
Human skeleton point clouds are commonly used to automatically classify ...
Synthesizing natural head motion to accompany speech for an embodied
con...
Generating realistic lip motions to simulate speech production is key fo...
Automatic speech recognition (ASR) is widely used in consumer electronic...
To detect bias in face recognition networks, it can be useful to probe a...
We describe our novel deep learning approach for driving animated faces ...
We present an introspection of an audiovisual speech enhancement model. ...
Speech-driven visual speech synthesis involves mapping features extracte...
We describe experiments towards building a conversational digital assist...
We propose a method for modeling and learning turn-taking behaviors for
...
A critical assumption of all current visual speech recognition systems i...
In the quest for greater computer lip-reading performance there are a nu...
Visual-only speech recognition is dependent upon a number of factors tha...