Difei Gao

research

∙ 08/19/2023

Recap: Detecting Deepfake Video with Unpredictable Tampered Traces via Recovering Faces and Mapping Recovered Faces

The exploitation of Deepfake techniques for malicious intentions has dri...

0 Juan Hu, et al. ∙

research

∙ 06/27/2023

GroundNLQ @ Ego4D Natural Language Queries Challenge 2023

In this report, we present our champion solution for Ego4D Natural Langu...

0 Zhijian Hou, et al. ∙

research

∙ 06/14/2023

AssistGPT: A General Multi-modal Assistant that can Plan, Execute, Inspect, and Learn

Recent research on Large Language Models (LLMs) has led to remarkable ad...

0 Difei Gao, et al. ∙

research

∙ 03/26/2023

Affordance Grounding from Demonstration Video to Target Image

Humans excel at learning from expert demonstrations and solving their ow...

0 Joya Chen, et al. ∙

research

∙ 03/03/2023

Mover: Mask and Recovery based Facial Part Consistency Aware Method for Deepfake Video Detection

Deepfake techniques have been widely used for malicious purposes, prompt...

0 Juan Hu, et al. ∙

research

∙ 12/19/2022

MIST: Multi-modal Iterative Spatial-Temporal Transformer for Long-form Video Question Answering

To build Video Question Answering (VideoQA) systems capable of assisting...

0 Difei Gao, et al. ∙

research

∙ 11/16/2022

An Efficient COarse-to-fiNE Alignment Framework @ Ego4D Natural Language Queries Challenge 2022

This technical report describes the CONE approach for Ego4D Natural Lang...

0 Zhijian Hou, et al. ∙

research

∙ 09/22/2022

CONE: An Efficient COarse-to-fiNE Alignment Framework for Long Video Temporal Grounding

Video temporal grounding (VTG) targets to localize temporal moments in a...

0 Zhijian Hou, et al. ∙

research

∙ 08/24/2022

Symbolic Replay: Scene Graph as Prompt for Continual Learning on VQA Task

VQA is an ambitious task aiming to answer any image-related question. Ho...

0 Stan Weixian Lei, et al. ∙

research

∙ 04/01/2022

Generic Event Boundary Captioning: A Benchmark for Status Changes Understanding

Cognitive science has shown that humans perceive videos in terms of even...

0 Yuxuan Wang, et al. ∙

research

∙ 03/08/2022

AssistQ: Affordance-centric Question-driven Task Completion for Egocentric Assistant

A long-standing goal of intelligent assistants such as AR glasses/robots...

0 Benita Wong, et al. ∙

research

∙ 11/30/2021

AssistSR: Affordance-centric Question-driven Video Segment Retrieval

It is still a pipe dream that AI assistants on phone and AR glasses can ...

8 Stan Weixian Lei, et al. ∙

research

∙ 03/31/2020

Multi-Modal Graph Neural Network for Joint Reasoning on Vision and Scene Text

Answering questions that require reading texts in an image is challengin...

0 Difei Gao, et al. ∙

research

∙ 08/08/2019

From Two Graphs to N Questions: A VQA Dataset for Compositional Reasoning on Vision and Commonsense

Visual Question Answering (VQA) is a challenging task for evaluating the...

4 Difei Gao, et al. ∙

Difei Gao

Featured Co-authors

Sign in with Google

Consider DeepAI Pro