In this work, we investigate the dynamics of stochastic gradient descent...
One way of introducing sparsity into deep networks is by attaching an
ex...
It is well established that increasing scale in deep transformer network...
In this work we establish an algorithm and distribution independent
non-...
In machine learning, we traditionally evaluate the performance of a sing...
To understand how deep learning works, it is crucial to understand the
t...
In this paper, we aim to learn a low-dimensional Euclidean representatio...