Bei Liu

research

∙ 08/22/2023

ViCo: Engaging Video Comment Generation with Human Preference Rewards

Engaging video comments play an important role in video social media, as...

0 Yuchong Sun, et al. ∙

research

∙ 08/21/2023

Improving Diversity in Zero-Shot GAN Adaptation with Semantic Variations

Training deep generative models usually requires a large amount of data....

0 Seogkyu Jeon, et al. ∙

research

∙ 07/18/2023

Revisiting Latent Space of GAN Inversion for Real Image Editing

The exploration of the latent space in StyleGANs and GAN inversion exemp...

0 Kai Katsumata, et al. ∙

research

∙ 07/15/2023

SINC: Self-Supervised In-Context Learning for Vision-Language Tasks

Large Pre-trained Transformers exhibit an intriguing capacity for in-con...

0 Yi-Syuan Chen, et al. ∙

research

∙ 06/09/2023

Pave the Way to Grasp Anything: Transferring Foundation Models for Universal Pick-Place Robots

Improving the generalization capabilities of general-purpose robotic age...

1 Jiange Yang, et al. ∙

research

∙ 05/31/2023

Balancing Reconstruction and Editing Quality of GAN Inversion for Real Image Editing with StyleGAN Prior Latent Space

The exploration of the latent space in StyleGANs and GAN inversion exemp...

0 Kai Katsumata, et al. ∙

research

∙ 05/30/2023

AlphaBlock: Embodied Finetuning for Vision-Language Reasoning in Robot Manipulation

We propose a novel framework for learning high-level cognitive capabilit...

1 Chuhao Jin, et al. ∙

research

∙ 05/18/2023

Whisper-KDQ: A Lightweight Whisper via Guided Knowledge Distillation and Quantization for Efficient ASR

Due to the rapid development of computing hardware resources and the dra...

0 Hang Shao, et al. ∙

research

∙ 11/02/2022

Build a SRE Challenge System: Lessons from VoxSRC 2022 and CNSRC 2022

Different speaker recognition challenges have been held to assess the sp...

0 Zhengyang Chen, et al. ∙

research

∙ 09/19/2022

SJTU-AISPEECH System for VoxCeleb Speaker Recognition Challenge 2022

This report describes the SJTU-AISPEECH system for the Voxceleb Speaker ...

0 Zhengyang Chen, et al. ∙

research

∙ 09/14/2022

CLIP-ViP: Adapting Pre-trained Image-Text Model to Video-Language Representation Alignment

The pre-trained image-text models, like CLIP, have demonstrated the stro...

0 Hongwei Xue, et al. ∙

research

∙ 08/10/2022

Exploring Anchor-based Detection for Ego4D Natural Language Query

In this paper we provide the technique report of Ego4D natural language ...

0 Sipeng Zheng, et al. ∙

research

∙ 06/23/2022

The SJTU X-LANCE Lab System for CNSRC 2022

This technical report describes the SJTU X-LANCE Lab system for the thre...

0 Zhengyang Chen, et al. ∙

research

∙ 11/29/2021

Searching the Search Space of Vision Transformer

Vision Transformer has shown great visual representation power in substa...

0 Minghao Chen, et al. ∙

research

∙ 11/19/2021

Advancing High-Resolution Video-Language Representation with Large-Scale Video Transcriptions

We study joint video and language (VL) pre-training to enable cross-moda...

0 Hongwei Xue, et al. ∙

research

∙ 10/19/2021

A Picture is Worth a Thousand Words: A Unified System for Diverse Captions and Rich Images Generation

A creative image-and-text generative AI system mimics humans' extraordin...

0 Yupan Huang, et al. ∙

research

∙ 10/19/2021

Unifying Multimodal Transformer for Bi-directional Image and Text Generation

We study the joint learning of image-to-text and text-to-image generatio...

0 Yupan Huang, et al. ∙

research

∙ 09/06/2021

Learning Fine-Grained Motion Embedding for Landscape Animation

In this paper we focus on landscape animation, which aims to generate ti...

0 Hongwei Xue, et al. ∙

research

∙ 08/10/2021

Reference-based Defect Detection Network

The defect detection task can be regarded as a realistic scenario of obj...

5 Zhaoyang Zeng, et al. ∙

research

∙ 06/25/2021

Probing Inter-modality: Visual Parsing with Self-Attention for Vision-Language Pre-training

Vision-Language Pre-training (VLP) aims to learn multi-modal representat...

0 Hongwei Xue, et al. ∙

research

∙ 04/07/2021

Seeing Out of tHe bOx: End-to-End Pre-training for Vision-Language Representation Learning

We study joint learning of Convolutional Neural Network (CNN) and Transf...

0 Zhicheng Huang, et al. ∙

research

∙ 06/04/2020

M3P: Learning Universal Representations via Multitask Multilingual Multimodal Pre-training

This paper presents a Multitask Multilingual Multimodal Pre-trained mode...

7 Haoyang Huang, et al. ∙

research

∙ 04/02/2020

Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal Transformers

We propose Pixel-BERT to align image pixels with text by deep multi-moda...

0 Zhicheng Huang, et al. ∙

research

∙ 11/24/2019

Neural Storyboard Artist: Visualizing Stories with Coherent Image Sequences

A storyboard is a sequence of images to illustrate a story containing mu...

0 Shizhe Chen, et al. ∙

research

∙ 10/29/2019

Learning Rich Image Region Representation for Visual Question Answering

We propose to boost VQA by leveraging more powerful feature extractors b...

0 Bei Liu, et al. ∙

research

∙ 10/19/2019

Gastroscopic Panoramic View: Application to Automatic Polyps Detection under Gastroscopy

Endoscopic diagnosis is an important means for gastric polyp detection. ...

16 Chenfei Shi, et al. ∙

research

∙ 09/11/2019

WSOD^2: Learning Bottom-up and Top-down Objectness Distillation for Weakly-supervised Object Detection

We study on weakly-supervised object detection (WSOD) which plays a vita...

0 Zhaoyang Zeng, et al. ∙

research

∙ 07/11/2019

Activitynet 2019 Task 3: Exploring Contexts for Dense Captioning Events in Videos

Contextual reasoning is essential to understand events in long untrimmed...

0 Shizhe Chen, et al. ∙

research

∙ 04/23/2018

Beyond Narrative Description: Generating Poetry from Images by Multi-Adversarial Training

Automatic generation of natural language from images has attracted exten...

0 Bei Liu, et al. ∙

Bei Liu

Featured Co-authors

Sign in with Google

Consider DeepAI Pro