Capacity Control of ReLU Neural Networks by Basis-path Norm

133 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Shuxin Zheng

تاريخ النشر 2018

مجال البحث الهندسة المعلوماتية الاحصاء الرياضي

والبحث باللغة English

تأليف Shuxin Zheng - Qi Meng - Huishuai Zhang

التعلم الآلي التعلم الالي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Recently, path norm was proposed as a new capacity measure for neural networks with Rectified Linear Unit (ReLU) activation function, which takes the rescaling-invariant property of ReLU into account. It has been shown that the generalization error bound in terms of the path norm explains the empirical generalization behaviors of the ReLU neural networks better than that of other capacity measures. Moreover, optimization algorithms which take path norm as the regularization term to the loss function, like Path-SGD, have been shown to achieve better generalization performance. However, the path norm counts the values of all paths, and hence the capacity measure based on path norm could be improperly influenced by the dependency among different paths. It is also known that each path of a ReLU network can be represented by a small group of linearly independent basis paths with multiplication and division operation, which indicates that the generalization behavior of the network only depends on only a few basis paths. Motivated by this, we propose a new norm emph{Basis-path Norm} based on a group of linearly independent paths to measure the capacity of neural networks more accurately. We establish a generalization error bound based on this basis path norm, and show it explains the generalization behaviors of ReLU networks more accurately than previous capacity measures via extensive experiments. In addition, we develop optimization algorithms which minimize the empirical risk regularized by the basis-path norm. Our experiments on benchmark datasets demonstrate that the proposed regularization method achieves clearly better performance on the test set than the previous regularization approaches.

قيم البحث

422 - Behnam Neyshabur , Ryota Tomioka , Nathan Srebro 2015

We investigate the capacity, convexity and characterization of a general family of norm-constrained feed-forward networks.

التعلم الآلي الذكاء الاصطناعي الحوسبة العصبية والتطورية

Path-Normalized Optimization of Recurrent Neural Networks with ReLU Activations

109 - Behnam Neyshabur , Yuhuai Wu , Ruslan Salakhutdinov 2016

We investigate the parameter-space geometry of recurrent neural networks (RNNs), and develop an adaptation of path-SGD optimization method, attuned to this geometry, that can learn plain RNNs with ReLU activations. On several datasets that require ca pturing long-term dependency structure, we show that path-SGD can significantly improve trainability of ReLU RNNs compared to RNNs trained with SGD, even with various recently suggested initialization schemes.

التعلم الآلي الحوسبة العصبية والتطورية

Globally Injective ReLU Networks

147 - Michael Puthawala , Konik Kothari , Matti Lassas 2020

Injectivity plays an important role in generative models where it enables inference; in inverse problems and compressed sensing with generative priors it is a precursor to well posedness. We establish sharp characterizations of injectivity of fully-c onnected and convolutional ReLU layers and networks. First, through a layerwise analysis, we show that an expansivity factor of two is necessary and sufficient for injectivity by constructing appropriate weight matrices. We show that global injectivity with iid Gaussian matrices, a commonly used tractable model, requires larger expansivity between 3.4 and 10.5. We also characterize the stability of inverting an injective network via worst-case Lipschitz constants of the inverse. We then use arguments from differential topology to study injectivity of deep networks and prove that any Lipschitz map can be approximated by an injective ReLU network. Finally, using an argument based on random projections, we show that an end-to-end -- rather than layerwise -- doubling of the dimension suffices for injectivity. Our results establish a theoretical basis for the study of nonlinear inverse and inference problems using neural networks.

التعلم الآلي التعلم الالي

ReLU Deep Neural Networks from the Hierarchical Basis Perspective

96 - Juncai He , Lin Li , Jinchao Xu 2021

We study ReLU deep neural networks (DNNs) by investigating their connections with the hierarchical basis method in finite element methods. First, we show that the approximation schemes of ReLU DNNs for $x^2$ and $xy$ are compositio

التحليل العددي التعلم الآلي التحليل العددي

Reverse-Engineering Deep ReLU Networks

83 - David Rolnick , Konrad P. Kording 2019

It has been widely assumed that a neural network cannot be recovered from its outputs, as the network depends on its parameters in a highly nonlinear way. Here, we prove that in fact it is often possible to identify the architecture, weights, and bia ses of an unknown deep ReLU network by observing only its output. Every ReLU network defines a piecewise linear function, where the boundaries between linear regions correspond to inputs for which some neuron in the network switches between inactive and active ReLU states. By dissecting the set of region boundaries into components associated with particular neurons, we show both theoretically and empirically that it is possible to recover the weights of neurons and their arrangement within the network, up to isomorphism.

التعلم الآلي التعلم الالي