Weakly Supervised Generative Network for Multiple 3D Human Pose Hypotheses
3D human pose estimation from a single image is an inverse problem due to the inherent ambiguity of the missing depth. Several previous works addressed the inverse problem by generating multiple hypotheses. However, these works are strongly supervised and require ground truth 2D-to-3D correspondences which can be difficult to obtain. In this paper, we propose a weakly supervised deep generative network to address the inverse problem and circumvent the need for ground truth 2D-to-3D correspondences. To this end, we design our network to model a proposal distribution which we use to approximate the unknown multi-modal target posterior distribution. We achieve the approximation by minimizing the KL divergence between the proposal and target distributions, and this leads to a 2D reprojection error and a prior loss term that can be weakly supervised. Furthermore, we determine the most probable solution as the conditional mode of the samples using the mean-shift algorithm. We evaluate our method on three benchmark datasets – Human3.6M, MPII and MPI-INF-3DHP. Experimental results show that our approach is capable of generating multiple feasible hypotheses and achieves state-of-the-art results compared to existing weakly supervised approaches. Our source code is available at the project website.
READ FULL TEXT