Magnitude Matters: Fixing SIGNSGD Through Magnitude-Aware Sparsification in the Presence of Data Heterogeneity
Communication overhead has become one of the major bottlenecks in the distributed training of deep neural networks. To alleviate the concern, various gradient compression methods have been proposed, and sign-based algorithms are of surging interest. However, SIGNSGD fails to converge in the presence of data heterogeneity, which is commonly observed in the emerging federated learning (FL) paradigm. Error feedback has been proposed to address the non-convergence issue. Nonetheless, it requires the workers to locally keep track of the compression errors, which renders it not suitable for FL since the workers may not participate in the training throughout the learning process. In this paper, we propose a magnitude-driven sparsification scheme, which addresses the non-convergence issue of SIGNSGD while further improving communication efficiency. Moreover, the local update scheme is further incorporated to improve the learning performance, and the convergence of the proposed method is established. The effectiveness of the proposed scheme is validated through experiments on Fashion-MNIST, CIFAR-10, and CIFAR-100 datasets.
READ FULL TEXT