Convergence of Deep Neural Networks to a Hierarchical Covariance Matrix Decomposition

03/14/2017
by   Nima Dehmamy, et al.
0

We show that in a deep neural network trained with ReLU, the low-lying layers should be replaceable with truncated linearly activated layers. We derive the gradient descent equations in this truncated linear model and demonstrate that --if the distribution of the training data is stationary during training-- the optimal choice for weights in these low-lying layers is the eigenvectors of the covariance matrix of the data. If the training data is random and uniform enough, these eigenvectors can be found using a small fraction of the training data, thus reducing the computational complexity of training. We show how this can be done recursively to form successive, trained layers. At least in the first layer, our tests show that this approach improves classification of images while reducing network size.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset