While large-scale image-text pretrained models such as CLIP have been us...
Large language models (LLMs) have shown remarkable capabilities in gener...
Reading to act is a prevalent but challenging task which requires the ab...
Text segmentation is a prerequisite in many real-world text-related task...
We study the problem of concept induction in visual reasoning, i.e.,
ide...
Learning segmentation from synthetic data and adapting to real data can
...
We consider the problem of unsupervised domain adaptation for semantic
s...
Clothing retrieval is a challenging problem in computer vision. With the...