We present a noisy channel generative model of two sequences, for exampl...
Deep learning model (primarily convolutional networks and LSTM) for time...
The question of generalization in machine learning---how algorithms are ...
Large over-parametrized models learned via stochastic gradient descent (...
We apply a fast kernel method for mask-based single-channel speech
enhan...
In recent years machine learning methods that nearly interpolate the dat...
Generalization performance of classifiers in deep learning has recently
...
Stochastic Gradient Descent (SGD) with small mini-batch is a key compone...
In this paper we first identify a basic limitation in gradient descent-b...