Discrete gradient descent differs qualitatively from gradient flow

08/14/2018

∙

We consider gradient descent on functions of the form L_1 = |f| and L_2 = f^2, where f: R^n →R is any smooth function with 0 as a regular value. We show that gradient descent implemented with a discrete step size τ behaves qualitatively differently from continuous gradient descent. We show that over long time scales, continuous and discrete gradient descent on L_1 find different minima of L_1, and we can characterize the difference - the minima that tend to be found by discrete gradient descent lie in a secondary critical submanifold M' ⊂ M, the locus within M where the function K=|∇ f|^2 |_M is minimized. In this paper, we explain this behavior. We also study the more subtle behavior of discrete gradient descent on L_2.

READ FULL TEXT

Discrete gradient descent differs qualitatively from gradient flow

Sign in with Google

Consider DeepAI Pro