Secondary gradient descent in higher codimension

09/14/2018

∙

In this paper, we analyze discrete gradient descent and ϵ-noisy gradient descent on a special but important class of functions. We find that when used to minimize a function L: R^n →R in this class we consider, discrete gradient descent can exhibit strikingly different behavior from continuous gradient descent. On long time scales, discrete gradient descent and continuous gradient descent tend toward different global minima of L. Discrete gradient descent preferentially finds global minima at which the graph of the function L is shallowest, while gradient flow shows no such preference.

READ FULL TEXT

Secondary gradient descent in higher codimension

Sign in with Google

Consider DeepAI Pro