Fast Evaluation and Approximation of the Gauss-Newton Hessian Matrix for the Multilayer Perceptron
We introduce a fast algorithm for entry-wise evaluation of the Gauss-Newton Hessian (GNH) matrix for the multilayer perceptron. The algorithm has a precomputation step and a sampling step. While it generally requires O(Nn) work to compute an entry (and the entire column) in the GNH matrix for a neural network with N parameters and n data points, our fast sampling algorithm reduces the cost to O(n+d/ϵ^2) work, where d is the output dimension of the network and ϵ is a prescribed accuracy (independent of N). One application of our algorithm is constructing the hierarchical-matrix () approximation of the GNH matrix for solving linear systems and eigenvalue problems. While it generally requires O(N^2) memory and O(N^3) work to store and factorize the GNH matrix, respectively. The approximation requires only (N r_o) memory footprint and (N r_o^2) work to be factorized, where r_o ≪ N is the maximum rank of off-diagonal blocks in the GNH matrix. We demonstrate the performance of our fast algorithm and the approximation on classification and autoencoder neural networks.
READ FULL TEXT