Language modality within the vision language pretraining framework is
in...
We study sequential decision making problems aimed at maximizing the exp...
Improving the retrieval relevance on noisy datasets is an emerging need ...
Aligning signals from different modalities is an important step in
visio...
Vision-language representation learning largely benefits from image-text...
The fact that there exists a gap between low-level features and semantic...
Most existing distance metric learning approaches use fully labeled data...
Much work in robotics has focused on "human-in-the-loop" learning techni...
An interpretable generative model for handwritten digits synthesis is
pr...
The model parameters of convolutional neural networks (CNNs) are determi...
Previous methods have dealt with discrete manipulation of facial attribu...
Recently, the popularity of depth-sensors such as Kinect has made depth
...