Learning high-quality video representation has shown significant applica...
Few-shot segmentation (FSS) is a dense prediction task that aims to infe...
In this paper, we introduce a large-scale and high-quality audio-visual
...
Visual feedback plays a crucial role in the process of amputation patien...
The art of communication beyond speech there are gestures. The automatic...
This paper further explores our previous wake word spotting system ranke...
Target-speaker voice activity detection is currently a promising approac...
Scene flow is the collection of each point motion information in the 3D ...
High-speed, high-resolution stereoscopic (H2-Stereo) video allows us to
...
Learning the embeddings for urban regions from human mobility data can r...
Semantic segmentation of building facade is significant in various
appli...
Long-Term visual localization under changing environments is a challengi...
In recent years, surveillance cameras are widely deployed in public plac...
This paper presents a dual camera system for high spatiotemporal resolut...
This paper proposes a new end-to-end trainable matching network based on...
Networked video applications, e.g., video conferencing, often suffer fro...
We present a novel deep convolutional network pipeline, LO-Net, for real...
Building open domain conversational systems that allow users to have eng...
High-speed trains (HSTs) are being widely deployed around the world. To ...
Conversational agents are exploding in popularity. However, much work re...
Conversational agents are exploding in popularity. However, much work re...
Geometric model fitting is a fundamental task in computer graphics and
c...