We present DiverseMotion, a new approach for synthesizing high-quality h...
Predicting stock prices presents a challenging research problem due to t...
Vision-language navigation (VLN), which entails an agent to navigate 3D
...
In this study, we focus on the problem of 3D human mesh recovery from a
...
Point cloud analysis (such as 3D segmentation and detection) is a challe...
We introduce a novel speaker model Kefa for navigation instruction
gener...
This report presents ReLER submission to two tracks in the Ego4D Episodi...
Language-guided image retrieval enables users to search for images and
i...
Misalignment between the outputs of a vision-language (VL) model and tas...
This paper presents a whitening-based contrastive learning method for
se...
Temporal action localization (TAL), which involves recognizing and locat...
Pre-trained vision-language models are the de-facto foundation models fo...
In this paper, we tackle the problem of sign language translation (SLT)
...
Multimodal Knowledge Graph Construction (MKGC) involves creating structu...
Scaling language models have revolutionized widespread NLP tasks, yet li...
Video-based 3D human pose and shape estimations are evaluated by intra-f...
Recently, visual-language navigation (VLN) – entailing robot agents to
f...
Though transfer learning is promising to increase the learning efficienc...
In this report, we present the ReLER@ZJU1 submission to the Ego4D Moment...
Knowledge Graphs (KGs) often have two characteristics: heterogeneous gra...
Self-supervised learning makes great progress in large model pre-trainin...
Multivariate long sequence time-series forecasting (M-LSTF) is a practic...
Sign language is the window for people differently-abled to express thei...
In this report, we present the ReLER@ZJU-Alibaba submission to the Ego4D...
Recently, large-scale pre-training methods like CLIP have made great pro...
This paper investigates how to realize better and more efficient embeddi...
It is of great challenge, though promising, to coordinate collective rob...
Reducing redundancy is crucial for improving the efficiency of video
rec...
We propose PR-RRN, a novel neural-network based method for Non-rigid
Str...
Obtaining viewer responses from videos can be useful for creators and
st...
Vehicle search is one basic task for the efficient traffic management in...
Text-video retrieval is a challenging task that aims to search relevant ...
Anticipating actions before they are executed is crucial for a wide rang...
Egocentric video recognition is a natural testbed for diverse interactio...
For the problem of 3D object recognition, researchers using deep learnin...
In this report, we present the Baidu-UTS submission to the EPIC-Kitchens...
Deep convolutional neural networks (CNNs) have enjoyed tremendous succes...