Stochastic Nested Variance Reduction for Nonconvex Optimization
We study finite-sum nonconvex optimization problems, where the objective function is an average of n nonconvex functions. We propose a new stochastic gradient descent algorithm based on nested variance reduction. Compared with conventional stochastic variance reduced gradient (SVRG) algorithm that uses two reference points to construct a semi-stochastic gradient with diminishing variance in each iteration, our algorithm uses K+1 nested reference points to build a semi-stochastic gradient to further reduce its variance in each iteration. For smooth nonconvex functions, the proposed algorithm converges to an ϵ-approximate first-order stationary point (i.e., ∇ F(x)_2≤ϵ) within Õ(nϵ^-2+ϵ^-3 n^1/2ϵ^-2) number of stochastic gradient evaluations. This improves the best known gradient complexity of SVRG O(n+n^2/3ϵ^-2) and that of SCSG O(nϵ^-2+ϵ^-10/3 n^2/3ϵ^-2). For gradient dominated functions, our algorithm also achieves a better gradient complexity than the state-of-the-art algorithms.
READ FULL TEXT