Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

An analytic theory of shallow networks dynamics for hinge loss classification

138 0 0.0 ( 0 )

Download Cite

Added by Franco Pellegrini

Publication date 2020

fields Mathematical Statistics Physics

and research's language is English

Authors Franco Pellegrini - Giulio Biroli

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Neural networks have been shown to perform incredibly well in classification tasks over structured high-dimensional datasets. However, the learning dynamics of such networks is still poorly understood. In this paper we study in detail the training dynamics of a simple type of neural network: a single hidden layer trained to perform a classification task. We show that in a suitable mean-field limit this case maps to a single-node learning problem with a time-dependent dataset determined self-consistently from the average nodes population. We specialize our theory to the prototypical case of a linearly separable dataset and a linear hinge loss, for which the dynamics can be explicitly solved. This allow us to address in a simple setting several phenomena appearing in modern networks such as slowing down of training dynamics, crossover between rich and lazy learning, and overfitting. Finally, we asses the limitations of mean-field theory by studying the case of large but finite number of nodes and of training samples.

rate research

The Gaussian equivalence of generative models for learning with shallow neural networks

77 - Sebastian Goldt , Bruno Loureiro , Galen Reeves 2020

Understanding the impact of data structure on the computational tractability of learning is a key challenge for the theory of neural networks. Many theoretical works do not explicitly model training data, or assume that inputs are drawn component-wise independently from some simple probability distribution. Here, we go beyond this simple paradigm by studying the performance of neural networks trained on data drawn from pre-trained generative models. This is possible due to a Gaussian equivalence stating that the key metrics of interest, such as the training and test errors, can be fully captured by an appropriately chosen Gaussian model. We provide three strands of rigorous, analytical and numerical evidence corroborating this equivalence. First, we establish rigorous conditions for the Gaussian equivalence to hold in the case of single-layer generative models, as well as deterministic rates for convergence in distribution. Second, we leverage this equivalence to derive a closed set of equations describing the generalisation performance of two widely studied machine learning problems: two-layer neural networks trained using one-pass stochastic gradient descent, and full-batch pre-learned features or kernel methods. Finally, we perform experiments demonstrating how our theory applies to deep, pre-trained generative models. These results open a viable path to the theoretical study of machine learning models with realistic data.

Machine Learning Disordered Systems and Neural Networks Statistical Mechanics

Semi-analytic approximate stability selection for correlated data in generalized linear models

78 - Takashi Takahashi , Yoshiyuki Kabashima 2020

We consider the variable selection problem of generalized linear models (GLMs). Stability selection (SS) is a promising method proposed for solving this problem. Although SS provides practical variable selection criteria, it is computationally demanding because it needs to fit GLMs to many re-sampled datasets. We propose a novel approximate inference algorithm that can conduct SS without the repeated fitting. The algorithm is based on the replica method of statistical mechanics and vector approximate message passing of information theory. For datasets characterized by rotation-invariant matrix ensembles, we derive state evolution equations that macroscopically describe the dynamics of the proposed algorithm. We also show that their fixed points are consistent with the replica symmetric solution obtained by the replica method. Numerical experiments indicate that the algorithm exhibits fast convergence and high approximation accuracy for both synthetic and real-world data.

Machine Learning Disordered Systems and Neural Networks Statistical Mechanics

Classical Information Theory of Networks

160 - Filippo Radicchi , Dmitri Krioukov , Harrison Hartle 2019

Existing information-theoretic frameworks based on maximum entropy network ensembles are not able to explain the emergence of heterogeneity in complex networks. Here, we fill this gap of knowledge by developing a classical framework for networks based on finding an optimal trade-off between the information content of a compressed representation of the ensemble and the information content of the actual network ensemble. In this way not only we introduce a novel classical network ensemble satisfying a set of soft constraints but we are also able to calculate the optimal distribution of the constraints. We show that for the classical network ensemble in which the only constraints are the expected degrees a power-law degree distribution is optimal. Also, we study spatially embedded networks finding that the interactions between nodes naturally lead to non-uniform spread of nodes in the space, with pairs of nodes at a given distance not necessarily obeying a power-law distribution. The pertinent features of real-world air transportation networks are well described by the proposed framework.

Physics and Society Disordered Systems and Neural Networks Statistical Mechanics

The Loss Surface of XOR Artificial Neural Networks

71 - Dhagash Mehta , Xiaojun Zhao , Edgar A. Bernal 2018

Training an artificial neural network involves an optimization process over the landscape defined by the cost (loss) as a function of the network parameters. We explore these landscapes using optimisation tools developed for potential energy landscapes in molecular science. The number of local minima and transition states (saddle points of index one), as well as the ratio of transition states to minima, grow rapidly with the number of nodes in the network. There is also a strong dependence on the regularisation parameter, with the landscape becoming more convex (fewer minima) as the regularisation term increases. We demonstrate that in our formulation, stationary points for networks with $N_h$ hidden nodes, including the minimal network required to fit the XOR data, are also stationary points for networks with $N_{h} +1$ hidden nodes when all the weights involving the additional nodes are zero. Hence, smaller networks optimized to train the XOR data are embedded in the landscapes of larger networks. Our results clarify certain aspects of the classification and sensitivity (to perturbations in the input data) of minima and saddle points for this system, and may provide insight into dropout and network compression.

Machine Learning Disordered Systems and Neural Networks Machine Learning

Cascade of Phase Transitions for Multi-Scale Clustering

97 - T. Bonnaire , A. Decelle , N. Aghanim 2020

We present a novel framework exploiting the cascade of phase transitions occurring during a simulated annealing of the Expectation-Maximisation algorithm to cluster datasets with multi-scale structures. Using the weighted local covariance, we can extract, a posteriori and without any prior knowledge, information on the number of clusters at different scales together with their size. We also study the linear stability of the iterative scheme to derive the threshold at which the first transition occurs and show how to approximate the next ones. Finally, we combine simulated annealing together with recent developments of regularised Gaussian mixture models to learn a principal graph from spatially structured datasets that can also exhibit many scales.

Machine Learning Disordered Systems and Neural Networks Statistical Mechanics

comments

Fetching comments

Sham Higher Institute of Forensic Sciences and the Arabic language and Islamic studies and research

Additional details More universities

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

An analytic theory of shallow networks dynamics for hinge loss classification

Ask ChatGPT about the research

No Arabic abstract

Read More