Non-ergodic linear convergence property of the delayed gradient descent under the strongly convexity and the Polyak-Łojasiewicz condition
In this work, we establish the linear convergence estimate for the gradient descent involving the delay τ∈ℕ when the cost function is μ-strongly convex and L-smooth. This result improves upon the well-known estimates in Arjevani et al. <cit.> and Stich-Karmireddy <cit.> in the sense that it is non-ergodic and is still established in spite of weaker constraint of cost function. Also, the range of learning rate η can be extended from η≤ 1/(10Lτ) to η≤ 1/(4Lτ) for τ =1 and η≤ 3/(10Lτ) for τ≥ 2, where L >0 is the Lipschitz continuity constant of the gradient of cost function. In a further research, we show the linear convergence of cost function under the Polyak-Łojasiewicz (PL) condition, for which the available choice of learning rate is further improved as η≤ 9/(10Lτ) for the large delay τ. Finally, some numerical experiments are provided in order to confirm the reliability of the analyzed results.
READ FULL TEXT