Performance of Low Synchronization Orthogonalization Methods in Anderson Accelerated Fixed Point Solvers
Anderson Acceleration (AA) is a method to accelerate the convergence of fixed point iterations for nonlinear, algebraic systems of equations. Due to the requirement of solving a least squares problem at each iteration and a reliance on modified Gram-Schmidt for updating the iteration space, AA requires extra costly synchronization steps for global reductions. Moreover, the number of reductions in each iteration depends on the size of the iteration space. In this work, we introduce three low synchronization orthogonalization algorithms into AA within SUNDIALS that reduce the total number of global reductions per iteration to a constant of 2 or 3, independent of the size of the iteration space. A performance study demonstrates the reduced time required by the new algorithms at large processor counts with CPUs and demonstrates the predicted performance on multi-GPU architectures. Most importantly, we provide convergence and timing data for multiple numerical experiments to demonstrate reliability of the algorithms within AA and improved performance at parallel strong-scaling limits.
READ FULL TEXT