The loss landscape of overparameterized neural networks
We explore some mathematical features of the loss landscape of overparameterized neural networks. A priori one might imagine that the loss function looks like a typical function from R^n to R - in particular, nonconvex, with discrete global minima. In this paper, we prove that in at least one important way, the loss function of an overparameterized neural network does not look like a typical function. If a neural net has n parameters and is trained on d data points, with n>d, we show that the locus M of global minima of L is usually not discrete, but rather an n-d dimensional submanifold of R^n. In practice, neural nets commonly have orders of magnitude more parameters than data points, so this observation implies that M is typically a very high-dimensional subset of R^n.
READ FULL TEXT