Memorization of the relation between entities in a dataset can lead to
p...
We propose a novel multimodal architecture for Scene Text Visual Questio...
We present DocFormer – a multi-modal transformer based architecture for ...
Estimating 3D hand poses from a single RGB image is challenging because ...
Estimating 3D hand pose directly from RGB imagesis challenging but has g...
Estimating3D hand poses from RGB images is essentialto a wide range of
p...
Self-supervised representation learning has seen remarkable progress in ...
Estimating the 3D hand pose from a monocular RGB image is important but
...
Hand pose estimation from a monocular RGB image is an important but
chal...
Question answering (QA) has achieved promising progress recently. Howeve...
Variational inference provides approximations to the computationally
int...