In this report, we present our champion solution for Ego4D Natural Langu...
This technical report describes the CONE approach for Ego4D Natural Lang...
Video temporal grounding (VTG) targets to localize temporal moments in a...
Cross-modal representation learning has become a new normal for bridging...
This paper tackles a recently proposed Video Corpus Moment Retrieval tas...
This notebook paper presents an overview and comparative analysis of our...