Daqing Liu

research

∙ 06/01/2023

Cocktail: Mixing Multi-Modality Controls for Text-Conditional Image Generation

Text-conditional diffusion models are able to generate high-fidelity ima...

0 Minghui Hu, et al. ∙

research

∙ 05/10/2023

MMoT: Mixture-of-Modality-Tokens Transformer for Composed Multimodal Conditional Image Synthesis

Existing multimodal conditional image synthesis (MCIS) methods generate ...

0 Jianbin Zheng, et al. ∙

research

∙ 11/21/2022

Cross-Modal Contrastive Learning for Robust Reasoning in VQA

Multi-modal reasoning in visual question answering (VQA) has witnessed r...

0 Qi Zheng, et al. ∙

research

∙ 06/21/2022

SemMAE: Semantic-Guided Masking for Learning Masked Autoencoders

Recently, significant progress has been made in masked image modeling to...

0 Gang Li, et al. ∙

research

∙ 06/14/2022

TransVG++: End-to-End Visual Grounding with Language Conditioned Vision Transformer

In this work, we explore neat yet effective Transformer-based frameworks...

19 Jiajun Deng, et al. ∙

research

∙ 06/02/2022

Modeling Image Composition for Complex Scene Generation

We present a method that achieves state-of-the-art results on challengin...

5 Zuopeng Yang, et al. ∙

research

∙ 01/06/2022

Compact Bidirectional Transformer for Image Captioning

Most current image captioning models typically generate captions from le...

0 Yuanen Zhou, et al. ∙

research

∙ 07/17/2020

Learning to Discretely Compose Reasoning Module Networks for Video Captioning

Generating natural language descriptions for videos, i.e., video caption...

9 Ganchao Tan, et al. ∙

research

∙ 04/01/2020

More Grounded Image Captioning by Distilling Image-Text Matching Model

Visual attention not only improves the performance of image captioners, ...

24 Yuanen Zhou, et al. ∙

research

∙ 06/09/2019

Referring Expression Grounding by Marginalizing Scene Graph Likelihood

We focus on the task of grounding referring expressions in images, e.g.,...

3 Daqing Liu, et al. ∙

research

∙ 06/06/2019

Context-Aware Visual Policy Network for Fine-Grained Image Captioning

With the maturity of visual detection techniques, we are more ambitious ...

0 Zheng-Jun Zha, et al. ∙

research

∙ 06/05/2019

Learning to Compose and Reason with Language Tree Structures for Visual Grounding

Grounding natural language in images, such as localizing "the black dog ...

0 Richang Hong, et al. ∙

research

∙ 12/08/2018

Explainability by Parsing: Neural Module Tree Networks for Natural Language Visual Grounding

Grounding natural language in images essentially requires composite visu...

0 Daqing Liu, et al. ∙

research

∙ 08/16/2018

Context-Aware Visual Policy Network for Sequence-Level Image Captioning

Many vision-language tasks can be reduced to the problem of sequence pre...

0 Daqing Liu, et al. ∙

Daqing Liu

Featured Co-authors

Sign in with Google

Consider DeepAI Pro