Xiaohan Wang

research

∙ 09/04/2023

DiverseMotion: Towards Diverse Human Motion Generation via Discrete Diffusion

We present DiverseMotion, a new approach for synthesizing high-quality h...

0 Yunhong Lou, et al. ∙

research

∙ 08/09/2023

Methods for Acquiring and Incorporating Knowledge into Stock Price Prediction: A Survey

Predicting stock prices presents a challenging research problem due to t...

0 Liping Wang, et al. ∙

research

∙ 08/09/2023

Bird's-Eye-View Scene Graph for Vision-Language Navigation

Vision-language navigation (VLN), which entails an agent to navigate 3D ...

0 Rui Liu, et al. ∙

research

∙ 07/31/2023

JOTR: 3D Joint Contrastive Learning with Transformers for Occluded Human Mesh Recovery

In this study, we focus on the problem of 3D human mesh recovery from a ...

0 Jiahao Li, et al. ∙

research

∙ 07/27/2023

Clustering based Point Cloud Representation Learning for 3D Analysis

Point cloud analysis (such as 3D segmentation and detection) is a challe...

0 Tuo Feng, et al. ∙

research

∙ 07/25/2023

Kefa: A Knowledge Enhanced and Fine-grained Aligned Speaker for Navigation Instruction Generation

We introduce a novel speaker model Kefa for navigation instruction gener...

0 Haitian Zeng, et al. ∙

research

∙ 06/15/2023

Action Sensitivity Learning for the Ego4D Episodic Memory Challenge 2023

This report presents ReLER submission to two tracks in the Ego4D Episodi...

0 Jiayi Shao, et al. ∙

research

∙ 06/03/2023

Relieving Triplet Ambiguity: Consensus Network for Language-Guided Image Retrieval

Language-guided image retrieval enables users to search for images and i...

0 Xu Zhang, et al. ∙

research

∙ 05/29/2023

Test-Time Adaptation with CLIP Reward for Zero-Shot Generalization in Vision-Language Models

Misalignment between the outputs of a vision-language (VL) model and tas...

0 Shuai Zhao, et al. ∙

research

∙ 05/28/2023

Whitening-based Contrastive Learning of Sentence Embeddings

This paper presents a whitening-based contrastive learning method for se...

0 Wenjie Zhuo, et al. ∙

research

∙ 05/25/2023

Action Sensitivity Learning for Temporal Action Localization

Temporal action localization (TAL), which involves recognizing and locat...

0 Jiayi Shao, et al. ∙

research

∙ 05/23/2023

CLIP4STR: A Simple Baseline for Scene Text Recognition with Pre-trained Vision-Language Model

Pre-trained vision-language models are the de-facto foundation models fo...

0 Shuai Zhao, et al. ∙

research

∙ 05/22/2023

Gloss-Free End-to-End Sign Language Translation

In this paper, we tackle the problem of sign language translation (SLT) ...

0 Kezhou Lin, et al. ∙

research

∙ 05/15/2023

Continual Multimodal Knowledge Graph Construction

Multimodal Knowledge Graph Construction (MKGC) involves creating structu...

0 Xiang Chen, et al. ∙

research

∙ 05/02/2023

How to Unleash the Power of Large Language Models for Few-shot Relation Extraction?

Scaling language models have revolutionized widespread NLP tasks, yet li...

0 Xin Xu, et al. ∙

research

∙ 03/26/2023

Global-to-Local Modeling for Video-based 3D Human Pose and Shape Estimation

Video-based 3D human pose and shape estimations are evaluated by intra-f...

0 Xiaolong Shen, et al. ∙

research

∙ 03/15/2023

Lana: A Language-Capable Navigator for Instruction Following and Generation

Recently, visual-language navigation (VLN) – entailing robot agents to f...

0 Xiaohan Wang, et al. ∙

research

∙ 12/07/2022

Policy Transfer via Enhanced Action Space

Though transfer learning is promising to increase the learning efficienc...

0 Zheng Zhang, et al. ∙

research

∙ 11/17/2022

ReLER@ZJU Submission to the Ego4D Moment Queries Challenge 2022

In this report, we present the ReLER@ZJU1 submission to the Ego4D Moment...

0 Jiayi Shao, et al. ∙

research

∙ 10/01/2022

PromptKG: A Prompt Learning Framework for Knowledge Graph Representation Learning and Application

Knowledge Graphs (KGs) often have two characteristics: heterogeneous gra...

0 Xin Xie, et al. ∙

research

∙ 09/30/2022

Slimmable Networks for Contrastive Self-supervised Learning

Self-supervised learning makes great progress in large model pre-trainin...

57 Shuai Zhao, et al. ∙

research

∙ 07/16/2022

Generalizable Memory-driven Transformer for Multivariate Long Sequence Time-series Forecasting

Multivariate long sequence time-series forecasting (M-LSTF) is a practic...

1 Mingjie Li, et al. ∙

research

∙ 07/08/2022

Jointly Harnessing Prior Structures and Temporal Consistency for Sign Language Video Generation

Sign language is the window for people differently-abled to express thei...

2 Yucheng Suo, et al. ∙

research

∙ 07/01/2022

ReLER@ZJU-Alibaba Submission to the Ego4D Natural Language Queries Challenge 2022

In this report, we present the ReLER@ZJU-Alibaba submission to the Ego4D...

1 Naiyuan Liu, et al. ∙

research

∙ 05/02/2022

CenterCLIP: Token Clustering for Efficient Text-Video Retrieval

Recently, large-scale pre-training methods like CLIP have made great pro...

3 Shuai Zhao, et al. ∙

research

∙ 03/22/2022

Associating Objects with Scalable Transformers for Video Object Segmentation

This paper investigates how to realize better and more efficient embeddi...

5 Zongxin Yang, et al. ∙

research

∙ 03/09/2022

Multi-robot Cooperative Pursuit via Potential Field-Enhanced Reinforcement Learning

It is of great challenge, though promising, to coordinate collective rob...

0 Zheng Zhang, et al. ∙

research

∙ 01/17/2022

Action Keypoint Network for Efficient Video Recognition

Reducing redundancy is crucial for improving the efficiency of video rec...

11 Xu Chen, et al. ∙

research

∙ 08/17/2021

PR-RRN: Pairwise-Regularized Residual-Recursive Networks for Non-rigid Structure-from-Motion

We propose PR-RRN, a novel neural-network based method for Non-rigid Str...

0 Haitian Zeng, et al. ∙

research

∙ 06/03/2021

Less is More: Sparse Sampling for Dense Reaction Predictions

Obtaining viewer responses from videos can be useful for creators and st...

14 Kezhou Lin, et al. ∙

research

∙ 05/31/2021

Connecting Language and Vision for Natural Language-Based Vehicle Retrieval

Vehicle search is one basic task for the efficient traffic management in...

20 Shuai Bai, et al. ∙

research

∙ 04/20/2021

T2VLAD: Global-Local Sequence Alignment for Text-Video Retrieval

Text-video retrieval is a challenging task that aims to search relevant ...

0 Xiaohan Wang, et al. ∙

research

∙ 01/13/2021

Learning to Anticipate Egocentric Actions by Imagination

Anticipating actions before they are executed is crucial for a wide rang...

0 Yu Wu, et al. ∙

research

∙ 02/08/2020

Symbiotic Attention with Privileged Information for Egocentric Action Recognition

Egocentric video recognition is a natural testbed for diverse interactio...

0 Xiaohan Wang, et al. ∙

research

∙ 02/08/2020

Variable-Viewpoint Representations for 3D Object Recognition

For the problem of 3D object recognition, researchers using deep learnin...

12 Tengyu Ma, et al. ∙

research

∙ 06/22/2019

Baidu-UTS Submission to the EPIC-Kitchens Action Recognition Challenge 2019

In this report, we present the Baidu-UTS submission to the EPIC-Kitchens...

0 Xiaohan Wang, et al. ∙

research

∙ 06/15/2018

Seeing Neural Networks Through a Box of Toys: The Toybox Dataset of Visual Object Transformations

Deep convolutional neural networks (CNNs) have enjoyed tremendous succes...

0 Xiaohan Wang, et al. ∙

Xiaohan Wang

Featured Co-authors

Sign in with Google

Consider DeepAI Pro