Recent studies of gradient descent with large step sizes have shown that...
Although learning in high dimensions is commonly believed to suffer from...
We introduce repriorisation, a data-dependent reparameterisation which
t...
Stochastic gradient descent (SGD) is a pillar of modern machine learning...
As modern machine learning models continue to advance the computational
...
We develop a stochastic differential equation, called homogenized SGD, f...
A significant obstacle in the development of robust machine learning mod...
Classical learning theory suggests that the optimal generalization
perfo...
Modern deep learning models have achieved great success in predictive
ac...
The softmax function combined with a cross-entropy loss is a principled
...
Modern deep learning models employ considerably more parameters than req...
We perform a careful, thorough, and large scale empirical study of the
c...
Recent work has shown that the prior over functions induced by a deep
Ba...
The selection of initial parameter values for gradient-based optimizatio...
A fundamental goal in deep learning is the characterization of trainabil...
One of the distinguishing characteristics of modern deep learning system...
We develop a mean field theory for batch normalization in fully-connecte...
A longstanding goal in deep learning research has been to precisely
char...
Training recurrent neural networks (RNNs) on long sequence tasks is plag...
There is a previously identified equivalence between wide fully connecte...
Recurrent neural networks have gained widespread use in modeling sequenc...
In recent years, state-of-the-art methods in computer vision have utiliz...
Recent work has shown that tight concentration of the entire spectrum of...
In practice it is often found that large over-parameterized neural netwo...
Many important problems are characterized by the eigenvalues of a large
...
It is well known that the initialization of weights in deep neural netwo...
A deep fully-connected neural network with an i.i.d. prior over its
para...
A number of recent papers have provided evidence that practical design
q...