Improved Convergence Rates for Non-Convex Federated Learning with Compression
Federated learning is a new distributed learning paradigm that enables efficient training of emerging large-scale machine learning models. In this paper, we consider federated learning on non-convex objectives with compressed communication from the clients to the central server. We propose a novel first-order algorithm (FedSTEPH2) that employs compressed communication and achieves the optimal iteration complexity of 𝒪(1/ϵ^1.5) to reach an ϵ-stationary point (i.e. 𝔼[∇ f(x)^2] ≤ϵ) on smooth non-convex objectives. The proposed scheme is the first algorithm that attains the aforementioned optimal complexity with compressed communication and without using full client gradients at each communication round. The key idea of FedSTEPH2 that enables attaining this optimal complexity is applying judicious momentum terms both in the local client updates and the global server update. As a prequel to FedSTEPH2, we propose FedSTEPH which involves a momentum term only in the local client updates. We establish that FedSTEPH enjoys improved convergence rates under various non-convex settings (such as the Polyak-Łojasiewicz condition) and with fewer assumptions than prior work.
READ FULL TEXT