Text-to-image models (T2I) offer a new level of flexibility by allowing ...
We introduce a zero-shot video captioning method that employs two frozen...
Given an input image, and nothing else, our method returns the bounding ...
Recent text-to-image matching models apply contrastive learning to large...