Gradient Descent Converges to Ridgelet Spectrum

07/07/2020
by   Sho Sonoda, et al.
0

Deep learning achieves a high generalization performance in practice, despite the non-convexity of the gradient descent learning problem. Recently, the inductive bias in deep learning has been studied through the characterization of local minima. In this study, we show that the distribution of parameters learned by gradient descent converges to a spectrum of the ridgelet transform based on a ridgelet analysis, which is a wavelet-like analysis developed for neural networks. This convergence is stronger than those shown in previous results, and guarantees the shape of the parameter distribution has been identified with the ridgelet spectrum. In numerical experiments with finite models, we visually confirm the resemblance between the distribution of learned parameters and the ridgelet spectrum. Our study provides a better understanding of the theoretical background of an inductive bias theory based on lazy regimes.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset