Visual Manipulation Relationship Network
Grasping is one of the most significant manip- ulation in everyday life, which can be influenced a lot by grasping order when there are several objects in the scene. Therefore, the manipulation relationships are needed to help robot better grasp and manipulate objects. This paper presents a new convolutional neural network architecture called Visual Manipulation Relationship Network (VMRN), which is used to help robot detect targets and predict the manipulation relationships in real time. To implement end-to-end training and meet real-time requirements in robot tasks, we propose the Object Pairing Pooling Layer (OP2L), which can help to predict all manipulation relationships in one forward process. To train VMRN, we collect a dataset named Visual Manipulation Rela- tionship Dataset (VMRD) consisting of 5185 images with more than 17000 object instances and the manipulation relationships between all possible pairs of objects in every image, which is labeled by the manipulation relationship tree. The experiment results show that the new network architecture can detect objects and predict manipulation relationships simultaneously and meet the real-time requirements in robot tasks.
READ FULL TEXT