research
∙
07/09/2020
AdaScale SGD: A User-Friendly Algorithm for Distributed Training
When using large-batch training to speed up stochastic gradient descent,...
research
∙
07/20/2018