Understanding Weight Normalized Deep Neural Networks with Rectified Linear Units
This paper presents a general framework for norm-based capacity control for L_p,q weight normalized deep neural networks. We establish the upper bound on the Rademacher complexities of this family. With an L_p,q normalization where q< p^*, and 1/p+1/p^*=1, we discuss properties of a width-independent capacity control, which only depends on depth by a square root term. We further analyze the approximation properties of L_p,q weight normalized deep neural networks. In particular, for an L_1,∞ weight normalized network, the approximation error can be controlled by the L_1 norm of the output layer, and the corresponding generalization error only depends on the architecture by the square root of the depth.
READ FULL TEXT