Directional Pruning of Deep Neural Networks

06/16/2020
by   Shih-Kang Chao, et al.
0

In the light of the fact that the stochastic gradient descent (SGD) often finds a flat minimum valley in the training loss, we propose a novel directional pruning method which searches for a sparse minimizer in that flat region. The proposed pruning method is automatic in the sense that neither retraining nor expert knowledge is required. To overcome the computational formidability of estimating the flat directions, we propose to use a carefully tuned ℓ_1 proximal gradient algorithm which can provably achieve the directional pruning with a small learning rate after sufficient training. The empirical results show that our algorithm performs competitively in highly sparse regime (92% sparsity) among many existing automatic pruning methods on the ResNet50 with the ImageNet, while using only a slightly higher wall time and memory footprint than the SGD. Using the VGG16 and the wide ResNet 28x10 on the CIFAR-10 and CIFAR-100, we demonstrate that our algorithm reaches the same minima valley as the SGD, and the minima found by our algorithm and the SGD do not deviate in directions that impact the training loss.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset