Shijie Geng

research

∙ 04/28/2023

LLaMA-Adapter V2: Parameter-Efficient Visual Instruction Model

How to efficiently transform large language models (LLMs) into instructi...

1 Peng Gao, et al. ∙

research

∙ 03/27/2023

Revisiting Multimodal Representation in Contrastive Learning: From Patch and Token Embeddings to Finite Discrete Tokens

Contrastive learning-based vision-language pre-training approaches, such...

0 Yuxiao Chen, et al. ∙

research

∙ 01/30/2023

Mono-STAR: Mono-camera Scene-level Tracking and Reconstruction

We present Mono-STAR, the first real-time 3D reconstruction system that ...

0 Haonan Chang, et al. ∙

research

∙ 08/06/2022

Frozen CLIP Models are Efficient Video Learners

Video recognition has been dominated by the end-to-end learning paradigm...

0 Ziyi Lin, et al. ∙

research

∙ 07/20/2022

Hierarchically Self-Supervised Transformer for Human Skeleton Representation Learning

Despite the success of fully-supervised human skeleton sequence modeling...

0 Yuxiao Chen, et al. ∙

research

∙ 12/11/2021

COMPOSER: Compositional Learning of Group Activity in Videos

Group Activity Recognition (GAR) detects the activity performed by a gro...

14 Honglu Zhou, et al. ∙

research

∙ 11/29/2021

A Simple Long-Tailed Recognition Baseline via Vision-Language Model

The visual world naturally exhibits a long-tailed distribution of open c...

9 Teli Ma, et al. ∙

research

∙ 10/13/2021

Audio-Visual Scene-Aware Dialog and Reasoning using Audio-Visual Transformers with Joint Student-Teacher Learning

In previous work, we have proposed the Audio-Visual Scene-Aware Dialog (...

0 Ankit P. Shah, et al. ∙

research

∙ 09/24/2021

Dense Contrastive Visual-Linguistic Pretraining

Inspired by the success of BERT, several multimodal representation learn...

0 Lei Shi, et al. ∙

research

∙ 06/04/2021

Scalable Transformers for Neural Machine Translation

Transformer has been widely adopted in Neural Machine Translation (NMT) ...

0 Peng Gao, et al. ∙

research

∙ 09/23/2020

Multi-Pass Transformer for Machine Translation

In contrast with previous approaches where information flows only toward...

0 Peng Gao, et al. ∙

research

∙ 07/26/2020

Contrastive Visual-Linguistic Pretraining

Several multi-modality representation learning approaches such as LXMERT...

10 Lei Shi, et al. ∙

research

∙ 07/08/2020

Spatio-Temporal Scene Graphs for Video Dialog

The Audio-Visual Scene-aware Dialog (AVSD) task requires an agent to ind...

0 Shijie Geng, et al. ∙

research

∙ 06/03/2020

Fairness-Aware Explainable Recommendation over Knowledge Graphs

There has been growing attention on fairness considerations recently, es...

1 Zuohui Fu, et al. ∙

research

∙ 01/29/2020

ABSent: Cross-Lingual Sentence Representation Mapping with Bidirectional GANs

A number of cross-lingual transfer learning approaches based on neural n...

0 Zuohui Fu, et al. ∙

research

∙ 07/16/2019

2nd Place Solution to the GQA Challenge 2019

We present a simple method that achieves unexpectedly superior performan...

1 Shijie Geng, et al. ∙

research

∙ 08/20/2018

CU-Net: Coupled U-Nets

We design a new connectivity pattern for the U-Net architecture. Given s...

0 Zhiqiang Tang, et al. ∙

research

∙ 08/07/2018

Quantized Densely Connected U-Nets for Efficient Landmark Localization

In this paper, we propose quantized densely connected U-Nets for efficie...

0 Zhiqiang Tang, et al. ∙

Shijie Geng

Featured Co-authors

Sign in with Google

Consider DeepAI Pro