No Arabic abstract
Recurrent neural networks (RNN) are powerful tools to explain how attractors may emerge from noisy, high-dimensional dynamics. We study here how to learn the ~N^(2) pairwise interactions in a RNN with N neurons to embed L manifolds of dimension D << N. We show that the capacity, i.e. the maximal ratio L/N, decreases as |log(epsilon)|^(-D), where epsilon is the error on the position encoded by the neural activity along each manifold. Hence, RNN are flexible memory devices capable of storing a large number of manifolds at high spatial resolution. Our results rely on a combination of analytical tools from statistical mechanics and random matrix theory, extending Gardners classical theory of learning to the case of patterns with strong spatial correlations.
The optimal capacity of a diluted Blume-Emery-Griffiths neural network is studied as a function of the pattern activity and the embedding stability using the Gardner entropy approach. Annealed dilution is considered, cutting some of the couplings referring to the ternary patterns themselves and some of the couplings related to the active patterns, both simultaneously (synchronous dilution) or independently (asynchronous dilution). Through the de Almeida-Thouless criterion it is found that the replica-symmetric solution is locally unstable as soon as there is dilution. The distribution of the couplings shows the typical gap with a width depending on the amount of dilution, but this gap persists even in cases where a particular type of coupling plays no role in the learning process.
We introduce an analytically solvable model of two-dimensional continuous attractor neural networks (CANNs). The synaptic input and the neuronal response form Gaussian bumps in the absence of external stimuli, and enable the network to track external stimuli by its translational displacement in the two-dimensional space. Basis functions of the two-dimensional quantum harmonic oscillator in polar coordinates are introduced to describe the distortion modes of the Gaussian bump. The perturbative method is applied to analyze its dynamics. Testing the method by considering the network behavior when the external stimulus abruptly changes its position, we obtain results of the reaction time and the amplitudes of various distortion modes, with excellent agreement with simulation results.
Causal inference explores the causation between actions and the consequent rewards on a covariate set. Recently deep learning has achieved a remarkable performance in causal inference, but existing statistical theories cannot well explain such an empirical success, especially when the covariates are high-dimensional. Most theoretical results in causal inference are asymptotic, suffer from the curse of dimensionality, and only work for the finite-action scenario. To bridge such a gap between theory and practice, this paper studies doubly robust off-policy learning by deep neural networks. When the covariates lie on a low-dimensional manifold, we prove nonasymptotic regret bounds, which converge at a fast rate depending on the intrinsic dimension of the manifold. Our results cover both the finite- and continuous-action scenarios. Our theory shows that deep neural networks are adaptive to the low-dimensional geometric structures of the covariates, and partially explains the success of deep learning for causal inference.
Recent advances in deep learning and neural networks have led to an increased interest in the application of generative models in statistical and condensed matter physics. In particular, restricted Boltzmann machines (RBMs) and variational autoencoders (VAEs) as specific classes of neural networks have been successfully applied in the context of physical feature extraction and representation learning. Despite these successes, however, there is only limited understanding of their representational properties and limitations. To better understand the representational characteristics of RBMs and VAEs, we study their ability to capture physical features of the Ising model at different temperatures. This approach allows us to quantitatively assess learned representations by comparing sample features with corresponding theoretical predictions. Our results suggest that the considered RBMs and convolutional VAEs are able to capture the temperature dependence of magnetization, energy, and spin-spin correlations. The samples generated by RBMs are more evenly distributed across temperature than those generated by VAEs. We also find that convolutional layers in VAEs are important to model spin correlations whereas RBMs achieve similar or even better performances without convolutional filters.
We consider restricted Boltzmann machine (RBMs) trained over an unstructured dataset made of blurred copies of definite but unavailable ``archetypes and we show that there exists a critical sample size beyond which the RBM can learn archetypes, namely the machine can successfully play as a generative model or as a classifier, according to the operational routine. In general, assessing a critical sample size (possibly in relation to the quality of the dataset) is still an open problem in machine learning. Here, restricting to the random theory, where shallow networks suffice and the grand-mother cell scenario is correct, we leverage the formal equivalence between RBMs and Hopfield networks, to obtain a phase diagram for both the neural architectures which highlights regions, in the space of the control parameters (i.e., number of archetypes, number of neurons, size and quality of the training set), where learning can be accomplished. Our investigations are led by analytical methods based on the statistical-mechanics of disordered systems and results are further corroborated by extensive Monte Carlo simulations.