Stochastic gradient descent (SGD) is the simplest deep learning optimize...
This paper presents modified memoryless quasi-Newton methods based on th...
Practical results have shown that deep learning optimizers using small
c...
Convergence and convergence rate analyses of adaptive methods, such as
A...
While the generative model has many advantages, it is not feasible to
ca...
Previous numerical results have shown that a two time-scale update rule
...
Numerical evaluations have definitively shown that, for deep learning
op...
Recently, convergence as well as convergence rate analyses of deep learn...
This paper proposes a conjugate-gradient-based Adam algorithm blending A...