Large Text-to-Image(T2I) diffusion models have shown a remarkable capabi...
Referring Expression Generation (REG) aims to generate unambiguous Refer...
Existing multimodal conversation agents have shown impressive abilities ...
In this work, we present a conceptually simple and effective method to t...