A Modular Approach to Block-diagonal Hessian Approximations for Second-order Optimization Methods
We propose a modular extension of the backpropagation algorithm for computation of the block diagonal of the training objective's Hessian to various levels of refinement. The approach compartmentalizes the otherwise tedious construction of the Hessian into local modules. It is applicable to feedforward neural network architectures, and can be integrated into existing machine learning libraries with relatively little overhead, facilitating the development of novel second-order optimization methods. Our formulation subsumes several recently proposed block-diagonal approximation schemes as special cases. Our PyTorch implementation is included with the paper.
READ FULL TEXT