We study the implicit bias of batch normalization trained by gradient
de...
Gradient regularization, as described in <cit.>, is a
highly effective t...
Mixup, a simple data augmentation method that randomly mixes two data po...
This paper considers the problem of learning a single ReLU neuron with
s...
We study linear regression under covariate shift, where the marginal
dis...
Stochastic gradient descent (SGD) has achieved great success due to its
...
Stochastic gradient descent (SGD) has been demonstrated to generalize we...
Adaptive gradient methods such as Adam have gained increasing popularity...
Stochastic gradient descent (SGD) exhibits strong algorithmic regulariza...
We consider a binary classification problem when the data comes from a
m...
We analyze the properties of adversarial training for learning adversari...
There is an increasing realization that algorithmic inductive biases are...
Understanding the algorithmic regularization effect of stochastic gradie...
We establish a new convergence analysis of stochastic gradient Langevin
...
We study the convergence of gradient descent (GD) and stochastic gradien...
A recent line of research on deep learning focuses on the extremely
over...
Graph convolutional networks (GCNs) have recently received wide attentio...
As an important Markov Chain Monte Carlo (MCMC) method, stochastic gradi...
A recent line of research has shown that gradient-based algorithms with
...
We study the problem of training deep neural networks with Rectified Lin...
We propose a fast stochastic Hamilton Monte Carlo (HMC) method, for samp...
We propose a family of nonconvex optimization algorithms that are able t...