Incorporating the Barzilai-Borwein Adaptive Step Size into Sugradient Methods for Deep Network Training

05/27/2022
by   Antonio Robles-Kelly, et al.
0

In this paper, we incorporate the Barzilai-Borwein step size into gradient descent methods used to train deep networks. This allows us to adapt the learning rate using a two-point approximation to the secant equation which quasi-Newton methods are based upon. Moreover, the adaptive learning rate method presented here is quite general in nature and can be applied to widely used gradient descent approaches such as Adagrad and RMSprop. We evaluate our method using standard example network architectures on widely available datasets and compare against alternatives elsewhere in the literature. In our experiments, our adaptive learning rate shows a smoother and faster convergence than that exhibited by the alternatives, with better or comparable performance.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset