No Arabic abstract
Motivated by questions originating from the study of a class of shallow student-teacher neural networks, methods are developed for the analysis of spurious minima in classes of gradient equivariant dynamics related to neural nets. In the symmetric case, methods depend on the generic equivariant bifurcation theory of irreducible representations of the symmetric group on $n$ symbols, $S_n$; in particular, the standard representation of $S_n$. It is shown that spurious minima do not arise from spontaneous symmetry breaking but rather through a complex deformation of the landscape geometry that can be encoded by a generic $S_n$-equivariant bifurcation. We describe minimal models for forced symmetry breaking that give a lower bound on the dynamic complexity involved in the creation of spurious minima when there is no symmetry. Results on generic bifurcation when there are quadratic equivariants are also proved; this work extends and clarifies results of Ihrig & Golubitsky and Chossat, Lauterback & Melbourne on the instability of solutions when there are quadratic equivariants.
We consider the optimization problem associated with fitting two-layer ReLU networks with $k$ hidden neurons, where labels are assumed to be generated by a (teacher) neural network. We leverage the rich symmetry exhibited by such models to identify various families of critical points and express them as power series in $k^{-frac{1}{2}}$. These expressions are then used to derive estimates for several related quantities which imply that not all spurious minima are alike. In particular, we show that while the loss function at certain types of spurious minima decays to zero like $k^{-1}$, in other cases the loss converges to a strictly positive constant. The methods used depend on symmetry, the geometry of group actions, bifurcation, and Artins implicit function theorem.
In this paper we give a Casimir Invariant for the Symmetric group $S_n$. Furthermore we obtain and present, for the first time in the literature, explicit formulas for the matrices of the standard representation in terms of the matrices of the permutation representation.
Estimation of parameters in differential equation models can be achieved by applying learning algorithms to quantitative time-series data. However, sometimes it is only possible to measure qualitative changes of a system in response to a controlled condition. In dynamical systems theory, such change points are known as bifurcations and lie on a function of the controlled condition called the bifurcation diagram. In this work, we propose a gradient-based semi-supervised approach for inferring the parameters of differential equations that produce a user-specified bifurcation diagram. The cost function contains a supervised error term that is minimal when the model bifurcations match the specified targets and an unsupervised bifurcation measure which has gradients that push optimisers towards bifurcating parameter regimes. The gradients can be computed without the need to differentiate through the operations of the solver that was used to compute the diagram. We demonstrate parameter inference with minimal models which explore the space of saddle-node and pitchfork diagrams and the genetic toggle switch from synthetic biology. Furthermore, the cost landscape allows us to organise models in terms of topological and geometric equivalence.
Symmetries and equivariance are fundamental to the generalization of neural networks on domains such as images, graphs, and point clouds. Existing work has primarily focused on a small number of groups, such as the translation, rotation, and permutation groups. In this work we provide a completely general algorithm for solving for the equivariant layers of matrix groups. In addition to recovering solutions from other works as special cases, we construct multilayer perceptrons equivariant to multiple groups that have never been tackled before, including $mathrm{O}(1,3)$, $mathrm{O}(5)$, $mathrm{Sp}(n)$, and the Rubiks cube group. Our approach outperforms non-equivariant baselines, with applications to particle physics and dynamical systems. We release our software library to enable researchers to construct equivariant layers for arbitrary matrix groups.
Let $R=Bbbk[x_1,dots,x_n]$ be a polynomial ring over a field $Bbbk$ and let $Isubset R$ be a monomial ideal preserved by the natural action of the symmetric group $mathfrak S_n$ on $R$. We give a combinatorial method to determine the $mathfrak S_n$-module structure of $mathrm{Tor}_i(I,Bbbk)$. Our formula shows that $mathrm{Tor}_i(I,Bbbk)$ is built from induced representations of tensor products of Specht modules associated to hook partitions, and their multiplicities are determined by topological Betti numbers of certain simplicial complexes. This result can be viewed as an $mathfrak S_n$-equivariant analogue of Hochsters formula for Betti numbers of monomial ideals. We apply our results to determine extremal Betti numbers of $mathfrak S_n$-invariant monomial ideals, and in particular recover formulas for their Castelnuovo--Mumford regularity and projective dimension. We also give a concrete recipe for how the Betti numbers change as we increase the number of variables, and in characteristic zero (or $>n$) we compute the $mathfrak S_n$-invariant part of $mathrm{Tor}_i(I,Bbbk)$ in terms of $mathrm{Tor}$ groups of the unsymmetrization of $I$.