The Best of Both Worlds: Learning Geometry-based 6D Object Pose Estimation
We address the task of estimating the 6D pose of known rigid objects, from RGB and RGB-D input images, in scenarios where the objects are heavily occluded. Our main contribution is a new modular processing pipeline. The first module localizes all known objects in the image via an existing instance segmentation network. The next module densely regresses the object surface positions in its local coordinate system, using an encoder-decoder network. The third module is purely a geometry-based algorithm to output the final 6D object poses. While the first two modules are learned from data, and the last one not, we believe that this is the best of both worlds: geometry-based and learning-based algorithms for object 6D pose estimation. This is validated by achieving state-of-the-art results for RGB input and a slight improvement over state-of-the-art for RGB-D input. However, in contrast to previous work, we achieve these results with the same pipeline for RGB and RGB-D input. Furthermore, to obtain these results, we give a second contribution of a new 3D occlusion-aware and object-centric data augmentation procedure.
READ FULL TEXT