Multimodal Large Language Models (MLLMs) have recently sparked significa...
Conventional multi-label classification (MLC) methods assume that all sa...
Electronic Health Records (EHR) are generated from clinical routine care...
Semi-supervised domain adaptation (SSDA) adapts a learner to a new domai...
Prompt tuning, a recently emerging paradigm, enables the powerful
vision...
Recommendation systems have shown great potential to solve the informati...
Temporal grounding is the task of locating a specific segment from an
un...
Device Model Generalization (DMG) is a practical yet under-investigated
...
Video Object Grounding (VOG) is the problem of associating spatial objec...
Understanding human emotions is a crucial ability for intelligent robots...
Content-Based Image Retrieval (CIR) aims to search for a target image by...
While annotating decent amounts of data to satisfy sophisticated learnin...
Modeling dynamic scenes is important for many applications such as virtu...
Natural language spatial video grounding aims to detect the relevant obj...
In this paper, we propose a novel semi-supervised learning (SSL) framewo...
Text-based image captioning (TextCap) requires simultaneous comprehensio...
The contemporary visual captioning models frequently hallucinate objects...
Grounded video description (GVD) encourages captioning models to attend ...