Deep Neural Networks (DNNs) have been a large driver and enabler for AI
...
Feature normalization transforms such as Batch and Layer-Normalization h...
We present a framework for using transformer networks as universal compu...
Fine-tuning pretrained language models (LMs) without making any architec...
Word translation without parallel corpora has become feasible, rivaling ...
In distributed learning, local SGD (also known as federated averaging) a...
A recent work by Ramanujan et al. (2020) provides significant empirical
...
It is well known that modern deep neural networks are powerful enough to...
A recent line of ground-breaking results for permutation-based SGD has
c...
Due to its decentralized nature, Federated Learning (FL) lends itself to...
The strong lottery ticket hypothesis (LTH) postulates that one can
appro...
Stochastic gradient descent without replacement sampling is widely used ...
To improve the resilience of distributed training to worst-case, or Byza...
Adversarial training is a technique for training robust machine learning...
Data augmentation (DA) is commonly used during model training, as it
sig...