Over-parameterization Improves Generalization in the XOR Detection Problem

10/06/2018
by   Alon Brutzkus, et al.
0

Empirical evidence suggests that neural networks with ReLU activations generalize better with over-parameterization. However, there is currently no theoretical analysis that explains this observation. In this work, we study a simplified learning task with over-parameterized convolutional networks that empirically exhibits the same qualitative phenomenon. For this setting, we provide a theoretical analysis of the optimization and generalization performance of gradient descent. Specifically, we prove data-dependent sample complexity bounds which show that over-parameterization improves the generalization performance of gradient descent.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset