Understanding Weight Normalized Deep Neural Networks with Rectified Linear Units


Abstract in English

This paper presents a general framework for norm-based capacity control for $L_{p,q}$ weight normalized deep neural networks. We establish the upper bound on the Rademacher complexities of this family. With an $L_{p,q}$ normalization where $qle p^*$, and $1/p+1/p^{*}=1$, we discuss properties of a width-independent capacity control, which only depends on depth by a square root term. We further analyze the approximation properties of $L_{p,q}$ weight normalized deep neural networks. In particular, for an $L_{1,infty}$ weight normalized network, the approximation error can be controlled by the $L_1$ norm of the output layer, and the corresponding generalization error only depends on the architecture by the square root of the depth.

Download