Recent advances in Scene Graph Generation (SGG) typically model the
rela...
Text-video retrieval is a challenging cross-modal task, which aims to al...
Dynamic scene graphs generated from video clips could help enhance the
s...
Contrastive learning-based video-language representation learning approa...
Most video-and-language representation learning approaches employ contra...