No Arabic abstract
The categorization ability of fully connected neural network models, with either discrete or continuous Q-state units, is studied in this work in replica symmetric mean-field theory. Hierarchically correlated multi-state patterns in a two level structure of ancestors and descendents (examples) are embedded in the network and the categorization task consists in recognizing the ancestors when the network is trained exclusively with their descendents. Explicit results for the dependence of the equilibrium properties of a Q=3-state model and a $Q=infty$-state model are obtained in the form of phase diagrams and categorization curves. A strong improvement of the categorization ability is found when the network is trained with examples of low activity. The categorization ability is found to be robust to finite threshold and synaptic noise. The Almeida-Thouless lines that limit the validity of the replica-symmetric results, are also obtained.
The parallel dynamics of the fully connected Blume-Emery-Griffiths neural network model is studied at zero temperature for arbitrary using a probabilistic approach. A recursive scheme is found determining the complete time evolution of the order parameters, taking into account all feedback correlations. It is based upon the evolution of the distribution of the local field, the structure of which is determined in detail. As an illustrative example, explicit analytic formula are given for the first few time steps of the dynamics. Furthermore, equilibrium fixed-point equations are derived and compared with the thermodynamic approach. The analytic results find excellent confirmation in extensive numerical simulations.
It is known that a trained Restricted Boltzmann Machine (RBM) on the binary Monte Carlo Ising spin configurations, generates a series of iterative reconstructed spin configurations which spontaneously flow and stabilize to the critical point of physical system. Here we construct a variety of Neural Network (NN) flows using the RBM and (variational) autoencoders, to study the q-state Potts and clock models on the square lattice for q = 2, 3, 4. The NN are trained on Monte Carlo spin configurations at various temperatures. We find that the trained NN flow does develop a stable point that coincides with critical point of the q-state spin models. The behavior of the NN flow is nontrivial and generative, since the training is unsupervised and without any prior knowledge about the critical point and the Hamiltonian of the underlying spin model. Moreover, we find that the convergence of the flow is independent of the types of NNs and spin models, hinting a universal behavior. Our results strengthen the potential applicability of the notion of the NN flow in studying various states of matter and offer additional evidence on the connection with the Renormalization Group flow.
The fully connected (FC) layer, one of the most fundamental modules in artificial neural networks (ANN), is often considered difficult and inefficient to train due to issues including the risk of overfitting caused by its large amount of parameters. Based on previous work studying ANN from linear spline perspectives, we propose a spline-based approach that eases the difficulty of training FC layers. Given some dataset, we first obtain a continuous piece-wise linear (CPWL) fit through spline methods such as multivariate adaptive regression spline (MARS). Next, we construct an ANN model from the linear spline model and continue to train the ANN model on the dataset using gradient descent optimization algorithms. Our experimental results and theoretical analysis show that our approach reduces the computational cost, accelerates the convergence of FC layers, and significantly increases the interpretability of the resulting model (FC layers) compared with standard ANN training with random parameter initialization followed by gradient descent optimizations.
The effects of a variable amount of random dilution of the synaptic couplings in Q-Ising multi-state neural networks with Hebbian learning are examined. A fraction of the couplings is explicitly allowed to be anti-Hebbian. Random dilution represents the dying or pruning of synapses and, hence, a static disruption of the learning process which can be considered as a form of multiplicative noise in the learning rule. Both parallel and sequential updating of the neurons can be treated. Symmetric dilution in the statics of the network is studied using the mean-field theory approach of statistical mechanics. General dilution, including asymmetric pruning of the couplings, is examined using the generating functional (path integral) approach of disordered systems. It is shown that random dilution acts as additive gaussian noise in the Hebbian learning rule with a mean zero and a variance depending on the connectivity of the network and on the symmetry. Furthermore, a scaling factor appears that essentially measures the average amount of anti-Hebbian couplings.
Starting from the mutual information we present a method in order to find a hamiltonian for a fully connected neural network model with an arbitrary, finite number of neuron states, Q. For small initial correlations between the neurons and the patterns it leads to optimal retrieval performance. For binary neurons, Q=2, and biased patterns we recover the Hopfield model. For three-state neurons, Q=3, we find back the recently introduced Blume-Emery-Griffiths network hamiltonian. We derive its phase diagram and compare it with those of related three-state models. We find that the retrieval region is the largest.