Prompting Large Pre-trained Vision-Language Models For Compositional Concept Learning

11/09/2022
by   Guangyue Xu, et al.
0

This work explores the zero-shot compositional learning ability of large pre-trained vision-language models(VLMs) within the prompt-based learning framework and propose a model (PromptCompVL) to solve the compositonal zero-shot learning (CZSL) problem. PromptCompVL makes two design choices: first, it uses a soft-prompting instead of hard-prompting to inject learnable parameters to reprogram VLMs for compositional learning. Second, to address the compositional challenge, it uses the soft-embedding layer to learn primitive concepts in different combinations. By combining both soft-embedding and soft-prompting, PromptCompVL achieves state-of-the-art performance on the MIT-States dataset. Furthermore, our proposed model achieves consistent improvement compared to other CLIP-based methods which shows the effectiveness of the proposed prompting strategies for CZSL.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset

Sign in with Google

×

Use your Google Account to sign in to DeepAI

×

Consider DeepAI Pro