Training algorithms, broadly construed, are an essential part of every d...
Adaptive regularization methods that exploit more than the diagonal entr...
In this work, we propose a novel approach for layerwise representation
l...
For industrial-scale advertising systems, prediction of ad click-through...
Transformer models have recently emerged as one of the foundational mode...
We present the surprising result that randomly initialized neural networ...
Optimizers like Adam and AdaGrad have been very successful in training
l...
Multi-task learning can leverage information learned by one task to bene...
In this work, we study the large-scale pretraining of BERT-Large with
di...
Optimization in machine learning, both theoretical and applied, is prese...
We study a local loss construction approach for optimizing neural networ...
There is a growing discrepancy in computer vision between large-scale mo...
Recently the LARS and LAMB optimizers have been proposed for training ne...
Multi-task learning can leverage information learned by one task to bene...
State-of-the-art optimization is steadily shifting towards massively par...
We investigate several confounding factors in the evaluation of optimiza...
Optimization in machine learning, both theoretical and applied, is prese...
We introduce a temperature into the exponential function and replace the...
Lingvo is a Tensorflow framework offering a complete solution for
collab...
Adaptive gradient-based optimizers such as AdaGrad and Adam are among th...
TensorFlow Ranking is the first open source library for solving large-sc...
Techniques such as ensembling and distillation promise model quality
imp...
Generalized linear models with nonlinear feature transformations are wid...