Hexiang Hu

research

∙ 05/31/2023

From Pixels to UI Actions: Learning to Follow Instructions via Graphical User Interfaces

Much of the previous work towards digital agents for graphical user inte...

0 Peter Shaw, et al. ∙

research

∙ 04/01/2023

Subject-driven Text-to-Image Generation via Apprenticeship Learning

Recent text-to-image generation models like DreamBooth have made remarka...

0 Wenhu Chen, et al. ∙

research

∙ 02/23/2023

Can Pre-trained Vision and Language Models Answer Visual Information-Seeking Questions?

Large language models have demonstrated an emergent capability in answer...

0 Yang Chen, et al. ∙

research

∙ 02/22/2023

Open-domain Visual Entity Recognition: Towards Recognizing Millions of Wikipedia Entities

Large-scale multi-modal pre-training models such as CLIP and PaLI exhibi...

0 Hexiang Hu, et al. ∙

research

∙ 10/06/2022

MuRAG: Multimodal Retrieval-Augmented Generator for Open Question Answering over Images and Text

While language Models store a massive amount of world knowledge implicit...

0 Wenhu Chen, et al. ∙

research

∙ 09/29/2022

Re-Imagen: Retrieval-Augmented Text-to-Image Generator

Research on text-to-image generation has witnessed significant progress ...

0 Wenhu Chen, et al. ∙

research

∙ 09/12/2022

PreSTU: Pre-Training for Scene-Text Understanding

The ability to read and reason about texts in an image is often lacking ...

0 Jihyung Kil, et al. ∙

research

∙ 09/29/2021

Visually Grounded Concept Composition

We investigate ways to compose complex concepts in texts from primitive ...

1 Bowen Zhang, et al. ∙

research

∙ 09/25/2021

Systematic Generalization on gSCAN: What is Nearly Solved and What is Next?

We analyze the grounded SCAN (gSCAN) benchmark, which was recently propo...

8 Linlu Qiu, et al. ∙

research

∙ 02/17/2021

A Simple and Effective Use of Object-Centric Images for Long-Tailed Object Detection

Object frequencies in daily scenes follow a long-tailed distribution. Ma...

14 Cheng Zhang, et al. ∙

research

∙ 11/18/2020

A Hierarchical Multi-Modal Encoder for Moment Localization in Video Corpus

Identifying a short segment in a long video that semantically matches a ...

0 Bowen Zhang, et al. ∙

research

∙ 11/09/2020

Learning the Best Pooling Strategy for Visual Semantic Embedding

Visual Semantic Embedding (VSE) is a dominant approach for vision-langua...

0 Jiacheng Chen, et al. ∙

research

∙ 10/06/2020

Learning to Represent Image and Text with Denotation Graph

Learning to fuse vision and language information and representing them i...

0 Bowen Zhang, et al. ∙

research

∙ 07/18/2020

Drinking from a Firehose: Continual Learning with Web-scale Natural Language

Continual learning systems will interact with humans, with each other, a...

14 Hexiang Hu, et al. ∙

research

∙ 05/10/2020

BabyWalk: Going Farther in Vision-and-Language Navigation by Taking Baby Steps

Learning to follow instructions is of fundamental importance to autonomo...

18 Wang Zhu, et al. ∙

research

∙ 01/13/2020

Visual Storytelling via Predicting Anchor Word Embeddings in the Stories

We propose a learning model for the task of visual storytelling. The mai...

6 Bowen Zhang, et al. ∙

research

∙ 10/30/2019

Multimodal Model-Agnostic Meta-Learning via Task-Aware Modulation

Model-agnostic meta-learners aim to acquire meta-learned parameters from...

5 Risto Vuorio, et al. ∙

research

∙ 06/07/2019

Learning Classifier Synthesis for Generalized Few-Shot Learning

Visual recognition in real-world requires handling long-tailed and even ...

9 Han-Jia Ye, et al. ∙

research

∙ 04/05/2019

Synthesized Policies for Transfer and Adaptation across Tasks and Environments

The ability to transfer in reinforcement learning is key towards buildin...

6 Hexiang Hu, et al. ∙

research

∙ 01/19/2019

Binary Image Selection (BISON): Interpretable Evaluation of Visual Grounding

Providing systems the ability to relate linguistic and visual content is...

0 Hexiang Hu, et al. ∙

research

∙ 12/18/2018

Toward Multimodal Model-Agnostic Meta-Learning

Gradient-based meta-learners such as MAML are able to learn a meta-prior...

0 Risto Vuorio, et al. ∙

research

∙ 12/10/2018

Learning Embedding Adaptation for Few-Shot Learning

Learning with limited data is a key challenge for visual recognition. Fe...

18 Han-Jia Ye, et al. ∙

research

∙ 10/25/2018

Engaging Image Captioning Via Personality

Standard image captioning tasks such as COCO and Flickr30k are factual, ...

2 Kurt Shuster, et al. ∙

research

∙ 10/16/2018

Cross-Modal and Hierarchical Modeling of Video and Text

Visual data and text data are composed of information at multiple granul...

0 Bowen Zhang, et al. ∙

research

∙ 08/13/2018

Multi-Task Learning for Sequence Tagging: An Empirical Study

We study three general multi-task learning (MTL) approaches on 11 sequen...

0 Soravit Changpinyo, et al. ∙

research

∙ 06/10/2018

Cross-Dataset Adaptation for Visual Question Answering

We investigate the problem of cross-dataset adaptation for visual questi...

0 Wei-Lun Chao, et al. ∙

research

∙ 06/10/2018

Learning Answer Embeddings for Visual Question Answering

We propose a novel probabilistic model for visual question answering (Vi...

0 Hexiang Hu, et al. ∙

research

∙ 02/18/2018

Structured Label Inference for Visual Understanding

Visual data such as images and videos contain a rich source of structure...

0 Nelson Nauata, et al. ∙

research

∙ 12/02/2017

Compressed Video Action Recognition

Training robust deep video representations has proven to be much more ch...

0 Chao-Yuan Wu, et al. ∙

research

∙ 04/24/2017

Being Negative but Constructively: Lessons Learnt from Creating Better Visual Question Answering Datasets

Visual question answering (QA) has attracted a lot of attention lately, ...

0 Wei-Lun Chao, et al. ∙

research

∙ 03/29/2017

LabelBank: Revisiting Global Perspectives for Semantic Segmentation

Semantic segmentation requires a detailed labeling of image pixels by ob...

0 Hexiang Hu, et al. ∙

research

∙ 12/28/2016

FastMask: Segment Multi-scale Object Candidates in One Shot

Objects appear to scale differently in natural images. This fact require...

0 Hexiang Hu, et al. ∙

research

∙ 11/24/2016

Recalling Holistic Information for Semantic Segmentation

Semantic segmentation requires a detailed labeling of image pixels by ob...

0 Hexiang Hu, et al. ∙

research

∙ 11/13/2015

Structure Inference Machines: Recurrent Neural Networks for Analyzing Relations in Group Activity Recognition

Rich semantic relations are important in a variety of visual recognition...

0 Zhiwei Deng, et al. ∙

Hexiang Hu

Featured Co-authors

Sign in with Google

Consider DeepAI Pro