Secondary gradient descent in higher codimension

09/14/2018
by   Y Cooper, et al.
0

In this paper, we analyze discrete gradient descent and ϵ-noisy gradient descent on a special but important class of functions. We find that when used to minimize a function L: R^n →R in this class we consider, discrete gradient descent can exhibit strikingly different behavior from continuous gradient descent. On long time scales, discrete gradient descent and continuous gradient descent tend toward different global minima of L. Discrete gradient descent preferentially finds global minima at which the graph of the function L is shallowest, while gradient flow shows no such preference.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset