No Arabic abstract
Continuous symmetries and their breaking play a prominent role in contemporary physics. Effective low-energy field theories around symmetry breaking states explain diverse phenomena such as superconductivity, magnetism, and the mass of nucleons. We show that such field theories can also be a useful tool in machine learning, in particular for loss functions with continuous symmetries that are spontaneously broken by random initializations. In this paper, we illuminate our earlier published work (Bamler & Mandt, 2018) on this topic more from the perspective of theoretical physics. We show that the analogies between superconductivity and symmetry breaking in temporal representation learning are rather deep, allowing us to formulate a gauge theory of `charged embedding vectors in time series models. We show that making the loss function gauge invariant speeds up convergence in such models.
We consider learning two layer neural networks using stochastic gradient descent. The mean-field description of this learning dynamics approximates the evolution of the network weights by an evolution in the space of probability distributions in $R^D$ (where $D$ is the number of parameters associated to each neuron). This evolution can be defined through a partial differential equation or, equivalently, as the gradient flow in the Wasserstein space of probability distributions. Earlier work shows that (under some regularity assumptions), the mean field description is accurate as soon as the number of hidden units is much larger than the dimension $D$. In this paper we establish stronger and more general approximation guarantees. First of all, we show that the number of hidden units only needs to be larger than a quantity dependent on the regularity properties of the data, and independent of the dimensions. Next, we generalize this analysis to the case of unbounded activation functions, which was not covered by earlier bounds. We extend our results to noisy stochastic gradient descent. Finally, we show that kernel ridge regression can be recovered as a special limit of the mean field analysis.
Mutual Information (MI) plays an important role in representation learning. However, MI is unfortunately intractable in continuous and high-dimensional settings. Recent advances establish tractable and scalable MI estimators to discover useful representation. However, most of the existing methods are not capable of providing an accurate estimation of MI with low-variance when the MI is large. We argue that directly estimating the gradients of MI is more appealing for representation learning than estimating MI in itself. To this end, we propose the Mutual Information Gradient Estimator (MIGE) for representation learning based on the score estimation of implicit distributions. MIGE exhibits a tight and smooth gradient estimation of MI in the high-dimensional and large-MI settings. We expand the applications of MIGE in both unsupervised learning of deep representations based on InfoMax and the Information Bottleneck method. Experimental results have indicated significant performance improvement in learning useful representation.
Learning representations of data is an important problem in statistics and machine learning. While the origin of learning representations can be traced back to factor analysis and multidimensional scaling in statistics, it has become a central theme in deep learning with important applications in computer vision and computational neuroscience. In this article, we review recent advances in learning representations from a statistical perspective. In particular, we review the following two themes: (a) unsupervised learning of vector representations and (b) learning of both vector and matrix representations.
Hamiltonian learning is crucial to the certification of quantum devices and quantum simulators. In this paper, we propose a hybrid quantum-classical Hamiltonian learning algorithm to find the coefficients of the Pauli operator components of the Hamiltonian. Its main subroutine is the practical log-partition function estimation algorithm, which is based on the minimization of the free energy of the system. Concretely, we devise a stochastic variational quantum eigensolver (SVQE) to diagonalize the Hamiltonians and then exploit the obtained eigenvalues to compute the free energys global minimum using convex optimization. Our approach not only avoids the challenge of estimating von Neumann entropy in free energy minimization, but also reduces the quantum resources via importance sampling in Hamiltonian diagonalization, facilitating the implementation of our method on near-term quantum devices. Finally, we demonstrate our approachs validity by conducting numerical experiments with Hamiltonians of interest in quantum many-body physics.
We propose here a new symplectic quantization scheme, where quantum fluctuations of a scalar field theory stem from two main assumptions: relativistic invariance and equiprobability of the field configurations with identical value of the action. In this approach the fictitious time of stochastic quantization becomes a genuine additional time variable, with respect to the coordinate time of relativity. This proper time is associated to a symplectic evolution in the action space, which allows one to investigate not only asymptotic, i.e. equilibrium, properties of the theory, but also its non-equilibrium transient evolution. In this paper, which is the first one in a series of two, we introduce a formalism which will be applied to general relativity in the companion work Symplectic quantization II.