Jie Lei

research

∙ 06/28/2023

AFPN: Asymptotic Feature Pyramid Network for Object Detection

Multi-scale features are of great importance in encoding objects with sc...

0 Guoyu Yang, et al. ∙

research

∙ 04/10/2023

ViT-Calibrator: Decision Stream Calibration for Vision Transformer

A surge of interest has emerged in utilizing Transformers in diverse vis...

0 Lin Chen, et al. ∙

research

∙ 04/09/2023

Propheter: Prophetic Teacher Guided Long-Tailed Distribution Learning

The problem of deep long-tailed learning, a prevalent challenge in the r...

0 Wenxiang Xu, et al. ∙

research

∙ 02/15/2023

Toward matrix multiplication for deep learning inference on the Xilinx Versal

The remarkable positive impact of Deep Neural Networks on many Artificia...

0 Jie Lei, et al. ∙

research

∙ 12/31/2022

Guided Hybrid Quantization for Object detection in Multimodal Remote Sensing Imagery via One-to-one Self-teaching

Considering the computation complexity, we propose a Guided Hybrid Quant...

0 Jiaqing Zhang, et al. ∙

research

∙ 12/15/2022

Vision Transformers are Parameter-Efficient Audio-Visual Learners

Vision transformers (ViTs) have achieved impressive results on various c...

1 Yan-Bo Lin, et al. ∙

research

∙ 12/09/2022

VindLU: A Recipe for Effective Video-and-Language Pretraining

The last several years have witnessed remarkable progress in video-and-l...

6 Feng Cheng, et al. ∙

research

∙ 11/21/2022

Perceiver-VL: Efficient Vision-and-Language Modeling with Iterative Latent Attention

We present Perceiver-VL, a vision-and-language framework that efficientl...

6 Zineng Tang, et al. ∙

research

∙ 09/27/2022

SuperYOLO: Super Resolution Assisted Object Detection in Multimodal Remote Sensing Imagery

In this paper, we propose an accurate yet fast small object detection me...

0 Jiaqing Zhang, et al. ∙

research

∙ 07/27/2022

Mid-level Representation Enhancement and Graph Embedded Uncertainty Suppressing for Facial Expression Recognition

Facial expression is an essential factor in conveying human emotional st...

0 Jie Lei, et al. ∙

research

∙ 06/07/2022

Revealing Single Frame Bias for Video-and-Language Learning

Training an effective video-and-language model intuitively requires mult...

1 Jie Lei, et al. ∙

research

∙ 05/22/2022

Language Models with Image Descriptors are Strong Few-Shot Video-Language Learners

The goal of this work is to build flexible video-language models that ca...

11 Zhenhailong Wang, et al. ∙

research

∙ 04/06/2022

ECLIPSE: Efficient Long-range Video Retrieval using Sight and Sound

We introduce an audiovisual method for long-range text-to-video retrieva...

2 Yan-Bo Lin, et al. ∙

research

∙ 03/25/2022

CNN LEGO: Disassembling and Assembling Convolutional Neural Network

Convolutional Neural Network (CNN), which mimics human visual perception...

0 Jiacong Hu, et al. ∙

research

∙ 03/10/2022

LoopITR: Combining Dual and Cross Encoder Architectures for Image-Text Retrieval

Dual encoders and cross encoders have been widely used for image-text re...

3 Jie Lei, et al. ∙

research

∙ 12/15/2021

Transcoded Video Restoration by Temporal Spatial Auxiliary Network

In most video platforms, such as Youtube, and TikTok, the played videos ...

0 Li Xu, et al. ∙

research

∙ 08/01/2021

Boundary Knowledge Translation based Reference Semantic Segmentation

Given a reference object of an unknown type in an image, human observers...

0 Lechao Cheng, et al. ∙

research

∙ 08/01/2021

Edge-competing Pathological Liver Vessel Segmentation with Limited Labels

The microvascular invasion (MVI) is a major prognostic factor in hepatoc...

0 Zunlei Feng, et al. ∙

research

∙ 07/30/2021

MTVR: Multilingual Moment Retrieval in Videos

We introduce mTVR, a large-scale multilingual video moment retrieval dat...

5 Jie Lei, et al. ∙

research

∙ 07/20/2021

QVHighlights: Detecting Moments and Highlights in Videos via Natural Language Queries

Detecting customized moments and highlights from videos given natural la...

11 Jie Lei, et al. ∙

research

∙ 06/21/2021

VIMPAC: Video Pre-Training via Masked Token Prediction and Contrastive Learning

Video understanding relies on perceiving the global content and modeling...

8 Hao Tan, et al. ∙

research

∙ 06/08/2021

VALUE: A Multi-Task Benchmark for Video-and-Language Understanding Evaluation

Most existing video-and-language (VidL) research focuses on a single dat...

3 Linjie Li, et al. ∙

research

∙ 06/01/2021

Adversarial VQA: A New Benchmark for Evaluating the Robustness of VQA Models

With large-scale pre-training, the past two years have witnessed signifi...

22 Linjie Li, et al. ∙

research

∙ 02/11/2021

Less is More: ClipBERT for Video-and-Language Learning via Sparse Sampling

The canonical approach to video-and-language learning (e.g., video quest...

2 Jie Lei, et al. ∙

research

∙ 02/04/2021

Unifying Vision-and-Language Tasks via Text Generation

Existing methods for vision-and-language learning typically require desi...

38 Jaemin Cho, et al. ∙

research

∙ 01/27/2021

Low-Power Audio Keyword Spotting using Tsetlin Machines

The emergence of Artificial Intelligence (AI) driven Keyword Spotting (K...

0 Jie Lei, et al. ∙

research

∙ 10/15/2020

What is More Likely to Happen Next? Video-and-Language Future Event Prediction

Given a video with aligned dialogue, people can often infer what is more...

3 Jie Lei, et al. ∙

research

∙ 07/04/2020

A Novel Multi-Step Finite-State Automaton for Arbitrarily Deterministic Tsetlin Machine Learning

Due to the high energy consumption and scalability challenges of deep le...

0 K. Darshana Abeyrathna, et al. ∙

research

∙ 05/11/2020

MART: Memory-Augmented Recurrent Transformer for Coherent Video Paragraph Captioning

Generating multi-sentence descriptions for videos is one of the most cha...

1 Jie Lei, et al. ∙

research

∙ 01/24/2020

TVR: A Large-Scale Dataset for Video-Subtitle Moment Retrieval

We introduce a new multimodal retrieval task - TV show Retrieval (TVR), ...

8 Jie Lei, et al. ∙

research

∙ 04/25/2019

TVQA+: Spatio-Temporal Grounding for Video Question Answering

We present the task of Spatio-Temporal Video Question Answering, which r...

0 Jie Lei, et al. ∙

research

∙ 09/05/2018

TVQA: Localized, Compositional Video Question Answering

Recent years have witnessed an increasing interest in image-based questi...

0 Jie Lei, et al. ∙

research

∙ 07/19/2018

Selective Zero-Shot Classification with Augmented Attributes

In this paper, we introduce a selective zero-shot classification problem...

0 Jie Song, et al. ∙

research

∙ 11/14/2017

TripletGAN: Training Generative Model with Triplet Loss

As an effective way of metric learning, triplet loss has been widely use...

0 Gongze Cao, et al. ∙

Jie Lei

Featured Co-authors

Sign in with Google

Consider DeepAI Pro