HGR-Net: A Two-stage Convolutional Neural Network for Hand Gesture Segmentation and Recognition
Robust recognition of hand gesture in real-world applications is still a challenging task due to the many aspects such as cluttered backgrounds and uncontrolled environment factors. In most existing methods hand segmentation is a primary step for hand gesture recognition, because it reduces redundant information from the image background, before passing them to the recognition stages. Therefore, in this paper we propose a two-stage deep convolutional neural network (CNN) architecture called HGR-Net, where the first stage performs accurate pixel-level semantic segmentation into hand region and the second stage identifies hand gesture style. The segmentation stage architecture is based on the combination of fully convolutional deep residual neural network and atrous spatial pyramid pooling. Although the segmentation sub-network is trained without using depth information, it is robust enough against challenging situations such as changes in the lightning and complex backgrounds. In the recognition stage a two-stream CNN is used to obtain the best classification score. We also apply an effective data augmentation technique for maximizing the generalization capability of HGR-Net. Extensive experiments on public hand gesture datasets show that our deep architecture achieves prominent performance in segmentation and recognition for static hand gestures.
READ FULL TEXT