Gradient-trained Weights in Wide Neural Networks Align Layerwise to Error-scaled Input Correlations

69 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Akhilan Boopathy

تاريخ النشر 2021

مجال البحث الهندسة المعلوماتية الاحصاء الرياضي

والبحث باللغة English

تأليف Akhilan Boopathy - Ila Fiete

التعلم الآلي التعلم الالي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Recent works have examined how deep neural networks, which can solve a variety of difficult problems, incorporate the statistics of training data to achieve their success. However, existing results have been established only in limited settings. In this work, we derive the layerwise weight dynamics of infinite-width neural networks with nonlinear activations trained by gradient descent. We show theoretically that weight updates are aligned with input correlations from intermediate layers weighted by error, and demonstrate empirically that the result also holds in finite-width wide networks. The alignment result allows us to formulate backpropagation-free learning rules, named Align-zero and Align-ada, that theoretically achieve the same alignment as backpropagation. Finally, we test these learning rules on benchmark problems in feedforward and recurrent neural networks and demonstrate, in wide networks, comparable performance to backpropagation.

قيم البحث

127 - Waleed Mustafa , Robert A. Vandermeulen , Marius Kloft 2020

Regularizing the input gradient has shown to be effective in promoting the robustness of neural networks. The regularization of the inputs Hessian is therefore a natural next step. A key challenge here is the computational complexity. Computing the H essian of inputs is computationally infeasible. In this paper we propose an efficient algorithm to train deep neural networks with Hessian operator-norm regularization. We analyze the approach theoretically and prove that the Hessian operator norm relates to the ability of a neural network to withstand an adversarial attack. We give a preliminary experimental evaluation on the MNIST and FMNIST datasets, which demonstrates that the new regularizer can, indeed, be feasible and, furthermore, that it increases the robustness of neural networks over input gradient regularization.

التعلم الآلي التعلم الالي

Exploiting Layerwise Convexity of Rectifier Networks with Sign Constrained Weights

45 - Senjian An , Farid Boussaid , Mohammed Bennamoun 2017

By introducing sign constraints on the weights, this paper proposes sign constrained rectifier networks (SCRNs), whose training can be solved efficiently by the well known majorization-minimization (MM) algorithms. We prove that the proposed two-hidd en-layer SCRNs, which exhibit negative weights in the second hidden layer and negative weights in the output layer, are capable of separating any two (or more) disjoint pattern sets. Furthermore, the proposed two-hidden-layer SCRNs can decompose the patterns of each class into several clusters so that each cluster is convexly separable from all the patterns from the other classes. This provides a means to learn the pattern structures and analyse the discriminant factors between different classes of patterns.

التعلم الآلي الذكاء الاصطناعي

Robustness of Bayesian Neural Networks to Gradient-Based Attacks

114 - Ginevra Carbone , Matthew Wicker , Luca Laurenti 2020

Vulnerability to adversarial attacks is one of the principal hurdles to the adoption of deep learning in safety-critical applications. Despite significant efforts, both practical and theoretical, the problem remains open. In this paper, we analyse th e geometry of adversarial attacks in the large-data, overparametrized limit for Bayesian Neural Networks (BNNs). We show that, in the limit, vulnerability to gradient-based attacks arises as a result of degeneracy in the data distribution, i.e., when the data lies on a lower-dimensional submanifold of the ambient space. As a direct consequence, we demonstrate that in the limit BNN posteriors are robust to gradient-based adversarial attacks. Experimental results on the MNIST and Fashion MNIST datasets with BNNs trained with Hamiltonian Monte Carlo and Variational Inference support this line of argument, showing that BNNs can display both high accuracy and robustness to gradient based adversarial attacks.

التعلم الآلي التعلم الالي

Gradient Boosting Neural Networks: GrowNet

333 - Sarkhan Badirli , Xuanqing Liu , Zhengming Xing 2020

A novel gradient boosting framework is proposed where shallow neural networks are employed as ``weak learners. General loss functions are considered under this unified framework with specific examples presented for classification, regression, and lea rning to rank. A fully corrective step is incorporated to remedy the pitfall of greedy function approximation of classic gradient boosting decision tree. The proposed model rendered outperforming results against state-of-the-art boosting methods in all three tasks on multiple datasets. An ablation study is performed to shed light on the effect of each model components and model hyperparameters.

التعلم الآلي التعلم الالي

Neural Networks Trained on Natural Scenes Exhibit Gestalt Closure

88 - Been Kim , Emily Reif , Martin Wattenberg 2019

The Gestalt laws of perceptual organization, which describe how visual elements in an image are grouped and interpreted, have traditionally been thought of as innate despite their ecological validity. We use deep-learning methods to investigate wheth er natural scene statistics might be sufficient to derive the Gestalt laws. We examine the law of closure, which asserts that human visual perception tends to close the gap by assembling elements that can jointly be interpreted as a complete figure or object. We demonstrate that a state-of-the-art convolutional neural network, trained to classify natural images, exhibits closure on synthetic displays of edge fragments, as assessed by similarity of internal representations. This finding provides support for the hypothesis that the human perceptual system is even more elegant than the Gestaltists imagined: a single law---adaptation to the statistical structure of the environment---might suffice as fundamental.

التعلم الآلي التعلم الالي