A Note on the Implicit Bias Towards Minimal Depth of Deep Neural Networks

02/18/2022
by   Tomer Galanti, et al.
1

Deep learning systems have steadily advanced the state of the art in a wide variety of benchmarks, demonstrating impressive performance in tasks ranging from image classification <cit.>, language processing <cit.>, open-ended environments <cit.>, to coding <cit.>. A central aspect that enables the success of these systems is the ability to train deep models instead of wide shallow ones <cit.>. Intuitively, a neural network is decomposed into hierarchical representations from raw data to high-level, more abstract features. While training deep neural networks repetitively achieves superior performance against their shallow counterparts, an understanding of the role of depth in representation learning is still lacking. In this work, we suggest a new perspective on understanding the role of depth in deep learning. We hypothesize that SGD training of overparameterized neural networks exhibits an implicit bias that favors solutions of minimal effective depth. Namely, SGD trains neural networks for which the top several layers are redundant. To evaluate the redundancy of layers, we revisit the recently discovered phenomenon of neural collapse <cit.>.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset