Stochastic Bilevel Distributed Optimization over a Network
Bilevel optimization has been applied to a wide variety of machine learning models. Numerous stochastic bilevel optimization algorithms have been developed in recent years. However, most of them restrict their focus on the single-machine setting so that they are incapable of handling the distributed data. To address this issue, under the setting where all participants compose a network and perform the peer-to-peer communication in this network, we developed two novel distributed stochastic bilevel optimization algorithms based on the gradient tracking communication mechanism and two different gradient estimators. Additionally, we show that they can achieve O(1/ϵ^2(1-λ)^2) and O(1/ϵ^3/2(1-λ)^2) convergence rate respectively to obtain the ϵ-accuracy solution, where 1-λ denotes the spectral gap of the communication network. To our knowledge, this is the first work achieving these theoretical results. Finally, we applied our algorithms to practical machine learning models, and the experimental results confirmed the efficacy of our algorithms.
READ FULL TEXT