No Arabic abstract
Self-supervised learning (SSL) of energy based models has an intuitive relation to equilibrium thermodynamics because the softmax layer, mapping energies to probabilities, is a Gibbs distribution. However, in what way SSL is a thermodynamic process? We show that some SSL paradigms behave as a thermodynamic composite system formed by representations and self-labels in contact with a nonequilibrium reservoir. Moreover, this system is subjected to usual thermodynamic cycles, such as adiabatic expansion and isochoric heating, resulting in a generalized Gibbs ensemble (GGE). In this picture, we show that learning is seen as a demon that operates in cycles using feedback measurements to extract negative work from the system. As applications, we examine some SSL algorithms using this idea.
In this paper we present a self-contained macroscopic description of diffusive systems interacting with boundary reservoirs and under the action of external fields. The approach is based on simple postulates which are suggested by a wide class of microscopic stochastic models where they are satisfied. The description however does not refer in any way to an underlying microscopic dynamics: the only input required are transport coefficients as functions of thermodynamic variables, which are experimentally accessible. The basic postulates are local equilibrium which allows a hydrodynamic description of the evolution, the Einstein relation among the transport coefficients, and a variational principle defining the out of equilibrium free energy. Associated to the variational principle there is a Hamilton-Jacobi equation satisfied by the free energy, very useful for concrete calculations. Correlations over a macroscopic scale are, in our scheme, a generic property of nonequilibrium states. Correlation functions of any order can be calculated from the free energy functional which is generically a non local functional of thermodynamic variables. Special attention is given to the notion of equilibrium state from the standpoint of nonequilibrium.
Self-training is an effective approach to semi-supervised learning. The key idea is to let the learner itself iteratively generate pseudo-supervision for unlabeled instances based on its current hypothesis. In combination with consistency regularization, pseudo-labeling has shown promising performance in various domains, for example in computer vision. To account for the hypothetical nature of the pseudo-labels, these are commonly provided in the form of probability distributions. Still, one may argue that even a probability distribution represents an excessive level of informedness, as it suggests that the learner precisely knows the ground-truth conditional probabilities. In our approach, we therefore allow the learner to label instances in the form of credal sets, that is, sets of (candidate) probability distributions. Thanks to this increased expressiveness, the learner is able to represent uncertainty and a lack of knowledge in a more flexible and more faithful manner. To learn from weakly labeled data of that kind, we leverage methods that have recently been proposed in the realm of so-called superset learning. In an exhaustive empirical evaluation, we compare our methodology to state-of-the-art self-supervision approaches, showing competitive to superior performance especially in low-label scenarios incorporating a high degree of uncertainty.
A Boltzmann machine is a stochastic neural network that has been extensively used in the layers of deep architectures for modern machine learning applications. In this paper, we develop a Boltzmann machine that is capable of modelling thermodynamic observables for physical systems in thermal equilibrium. Through unsupervised learning, we train the Boltzmann machine on data sets constructed with spin configurations importance-sampled from the partition function of an Ising Hamiltonian at different temperatures using Monte Carlo (MC) methods. The trained Boltzmann machine is then used to generate spin states, for which we compare thermodynamic observables to those computed by direct MC sampling. We demonstrate that the Boltzmann machine can faithfully reproduce the observables of the physical system. Further, we observe that the number of neurons required to obtain accurate results increases as the system is brought close to criticality.
We develop the stochastic approach to thermodynamics based on the stochastic dynamics, which can be discrete (master equation) continuous (Fokker-Planck equation), and on two assumptions concerning entropy. The first is the definition of entropy itself and the second, the definition of entropy production rate which is nonnegative and vanishes in thermodynamic equilibrium. Based on these assumptions we study interacting systems with many degrees of freedom in equilibrium or out of thermodynamic equilibrium, and how the macroscopic laws are derived from the stochastic dynamics. These studies include the quasi-equilibrium processes, the convexity of the equilibrium surface, the monotonic time behavior of thermodynamic potentials, including entropy, the bilinear form of the entropy production rate, the Onsager coefficients and reciprocal relations, and the nonequilibrium steady states of chemical reactions.
Self-supervised learning (especially contrastive learning) has attracted great interest due to its tremendous potentials in learning discriminative representations in an unsupervised manner. Despite the acknowledged successes, existing contrastive learning methods suffer from very low learning efficiency, e.g., taking about ten times more training epochs than supervised learning for comparable recognition accuracy. In this paper, we discover two contradictory phenomena in contrastive learning that we call under-clustering and over-clustering problems, which are major obstacles to learning efficiency. Under-clustering means that the model cannot efficiently learn to discover the dissimilarity between inter-class samples when the negative sample pairs for contrastive learning are insufficient to differentiate all the actual object categories. Over-clustering implies that the model cannot efficiently learn the feature representation from excessive negative sample pairs, which enforces the model to over-cluster samples of the same actual categories into different clusters. To simultaneously overcome these two problems, we propose a novel self-supervised learning framework using a median triplet loss. Precisely, we employ a triplet loss tending to maximize the relative distance between the positive pair and negative pairs to address the under-clustering problem; and we construct the negative pair by selecting the negative sample of a median similarity score from all negative samples to avoid the over-clustering problem, guaranteed by the Bernoulli Distribution model. We extensively evaluate our proposed framework in several large-scale benchmarks (e.g., ImageNet, SYSU-30k, and COCO). The results demonstrate the superior performance (e.g., the learning efficiency) of our model over the latest state-of-the-art methods by a clear margin. Codes available at: https://github.com/wanggrun/triplet.