Existing fine-grained intensity regulation methods rely on explicit cont...
Vision-and-language pretraining (VLP) aims to learn generic multimodal
r...
Designing effective neural networks is fundamentally important in deep
m...
Learning an effective attention mechanism for multimodal data is importa...
Visual Question Answering (VQA) requires a fine-grained and simultaneous...