On the Convergence of Federated Optimization in Heterogeneous Networks
The burgeoning field of federated learning involves training machine learning models in massively distributed networks, and requires the development of novel distributed optimization techniques. Federated averaging () is the leading optimization method for training non-convex models in this setting, exhibiting impressive empirical performance. However, the behavior of is not well understood, particularly when considering data heterogeneity across devices in terms of sample sizes and underlying data distributions. In this work, we ask the following two questions: (1) Can we gain a principled understanding of in realistic federated settings? (2) Given our improved understanding, can we devise an improved federated optimization algorithm? To this end, we propose and introduce , which is similar in spirit to , but more amenable to theoretical analysis. We characterize the convergence of under a novel device similarity assumption.
READ FULL TEXT