In recent years, large-scale models have demonstrated state-of-the-art
p...
In recent years, the number of parameters of one deep learning (DL) mode...
Large transformer models display promising performance on a wide range o...
Deep learning recommendation models (DLRMs) have been widely applied in
...
The pre-trained model (PTM) is revolutionizing Artificial intelligence (...
The transformer is the most critical algorithm innovation of the Nature
...
This paper reports our efforts on swCaffe, a highly efficient parallel
f...
Data parallelism has already become a dominant method to scale Deep Neur...