Despite the recent success of stochastic gradient descent in deep learni...
Recent large-scale neural autoregressive sequence models have shown
impr...
In natural language processing, it has been observed recently that
gener...
Although stochastic gradient descent (SGD) is a driving force behind the...