No Arabic abstract
The task of mapping two or more distributions to a shared representation has many applications including fair representations, batch effect mitigation, and unsupervised domain adaptation. However, most existing formulations only consider the setting of two distributions, and moreover, do not have an identifiable, unique shared latent representation. We use optimal transport theory to consider a natural multiple distribution extension of the Monge assignment problem we call the symmetric Monge map problem and show that it is equivalent to the Wasserstein barycenter problem. Yet, the maps to the barycenter are challenging to estimate. Prior methods often ignore transportation cost, rely on adversarial methods, or only work for discrete distributions. Therefore, our goal is to estimate invertible maps between two or more distributions and their corresponding barycenter via a simple iterative flow method. Our method decouples each iteration into two subproblems: 1) estimate simple distributions and 2) estimate the invertible maps to the barycenter via known closed-form OT results. Our empirical results give evidence that this iterative algorithm approximates the maps to the barycenter.
In this paper we propose to perform model ensembling in a multiclass or a multilabel learning setting using Wasserstein (W.) barycenters. Optimal transport metrics, such as the Wasserstein distance, allow incorporating semantic side information such as word embeddings. Using W. barycenters to find the consensus between models allows us to balance confidence and semantics in finding the agreement between the models. We show applications of Wasserstein ensembling in attribute-based classification, multilabel learning and image captioning generation. These results show that the W. ensembling is a viable alternative to the basic geometric or arithmetic mean ensembling.
This work presents an algorithm to sample from the Wasserstein barycenter of absolutely continuous measures. Our method is based on the gradient flow of the multimarginal formulation of the Wasserstein barycenter, with an additive penalization to account for the marginal constraints. We prove that the minimum of this penalized multimarginal formulation is achieved for a coupling that is close to the Wasserstein barycenter. The performances of the algorithm are showcased in several settings.
When manipulating three-dimensional data, it is possible to ensure that rotational and translational symmetries are respected by applying so-called SE(3)-equivariant models. Protein structure prediction is a prominent example of a task which displays these symmetries. Recent work in this area has successfully made use of an SE(3)-equivariant model, applying an iterative SE(3)-equivariant attention mechanism. Motivated by this application, we implement an iterative version of the SE(3)-Transformer, an SE(3)-equivariant attention-based model for graph data. We address the additional complications which arise when applying the SE(3)-Transformer in an iterative fashion, compare the iterative and single-pa
Iterative machine teaching is a method for selecting an optimal teaching example that enables a student to efficiently learn a target concept at each iteration. Existing studies on iterative machine teaching are based on supervised machine learning and assume that there are teachers who know the true answers of all teaching examples. In this study, we consider an unsupervised case where such teachers do not exist; that is, we cannot access the true answer of any teaching example. Students are given a teaching example at each iteration, but there is no guarantee if the corresponding label is correct. Recent studies on crowdsourcing have developed methods for estimating the true answers from crowdsourcing responses. In this study, we apply these to iterative machine teaching for estimating the true labels of teaching examples along with student models that are used for teaching. Our method supports the collaborative learning of students without teachers. The experimental results show that the teaching performance of our method is particularly effective for low-level students in particular.
Event sequences can be modeled by temporal point processes (TPPs) to capture their asynchronous and probabilistic nature. We propose an intensity-free framework that directly models the point process distribution by utilizing normalizing flows. This approach is capable of capturing highly complex temporal distributions and does not rely on restrictive parametric forms. Comparisons with state-of-the-art baseline models on both synthetic and challenging real-life datasets show that the proposed framework is effective at modeling the stochasticity of discrete event sequences.