No Arabic abstract
This work presents an algorithm to sample from the Wasserstein barycenter of absolutely continuous measures. Our method is based on the gradient flow of the multimarginal formulation of the Wasserstein barycenter, with an additive penalization to account for the marginal constraints. We prove that the minimum of this penalized multimarginal formulation is achieved for a coupling that is close to the Wasserstein barycenter. The performances of the algorithm are showcased in several settings.
In this paper we propose to perform model ensembling in a multiclass or a multilabel learning setting using Wasserstein (W.) barycenters. Optimal transport metrics, such as the Wasserstein distance, allow incorporating semantic side information such as word embeddings. Using W. barycenters to find the consensus between models allows us to balance confidence and semantics in finding the agreement between the models. We show applications of Wasserstein ensembling in attribute-based classification, multilabel learning and image captioning generation. These results show that the W. ensembling is a viable alternative to the basic geometric or arithmetic mean ensembling.
The task of mapping two or more distributions to a shared representation has many applications including fair representations, batch effect mitigation, and unsupervised domain adaptation. However, most existing formulations only consider the setting of two distributions, and moreover, do not have an identifiable, unique shared latent representation. We use optimal transport theory to consider a natural multiple distribution extension of the Monge assignment problem we call the symmetric Monge map problem and show that it is equivalent to the Wasserstein barycenter problem. Yet, the maps to the barycenter are challenging to estimate. Prior methods often ignore transportation cost, rely on adversarial methods, or only work for discrete distributions. Therefore, our goal is to estimate invertible maps between two or more distributions and their corresponding barycenter via a simple iterative flow method. Our method decouples each iteration into two subproblems: 1) estimate simple distributions and 2) estimate the invertible maps to the barycenter via known closed-form OT results. Our empirical results give evidence that this iterative algorithm approximates the maps to the barycenter.
Learning generative models is challenging for a network edge node with limited data and computing power. Since tasks in similar environments share model similarity, it is plausible to leverage pre-trained generative models from the cloud or other edge nodes. Appealing to optimal transport theory tailored towards Wasserstein-1 generative adversarial networks (WGAN), this study aims to develop a framework which systematically optimizes continual learning of generative models using local data at the edge node while exploiting adaptive coalescence of pre-trained generative models. Specifically, by treating the knowledge transfer from other nodes as Wasserstein balls centered around their pre-trained models, continual learning of generative models is cast as a constrained optimization problem, which is further reduced to a Wasserstein-1 barycenter problem. A two-stage approach is devised accordingly: 1) The barycenters among the pre-trained models are computed offline, where displacement interpolation is used as the theoretic foundation for finding adaptive barycenters via a recursive WGAN configuration; 2) the barycenter computed offline is used as meta-model initialization for continual learning and then fast adaptation is carried out to find the generative model using the local samples at the target edge node. Finally, a weight ternarization method, based on joint optimization of weights and threshold for quantization, is developed to compress the generative model further.
The Wasserstein barycenter has been widely studied in various fields, including natural language processing, and computer vision. However, it requires a high computational cost to solve the Wasserstein barycenter problem because the computation of the Wasserstein distance requires a quadratic time with respect to the number of supports. By contrast, the Wasserstein distance on a tree, called the tree-Wasserstein distance, can be computed in linear time and allows for the fast comparison of a large number of distributions. In this study, we propose a barycenter under the tree-Wasserstein distance, called the fixed support tree-Wasserstein barycenter (FS-TWB) and its extension, called the fixed support tree-sliced Wasserstein barycenter (FS-TSWB). More specifically, we first show that the FS-TWB and FS-TSWB problems are convex optimization problems and can be solved by using the projected subgradient descent. Moreover, we propose a more efficient algorithm to compute the subgradient and objective function value by using the properties of tree-Wasserstein barycenter problems. Through real-world experiments, we show that, by using the proposed algorithm, the FS-TWB and FS-TSWB can be solved two orders of magnitude faster than the original Wasserstein barycenter.
In this paper, we focus on computational aspects of the Wasserstein barycenter problem. We propose two algorithms to compute Wasserstein barycenters of $m$ discrete measures of size $n$ with accuracy $e$. The first algorithm, based on mirror prox with a specific norm, meets the complexity of celebrated accelerated iterative Bregman projections (IBP), namely $widetilde O(mn^2sqrt n/e)$, however, with no limitations in contrast to the (accelerated) IBP, which is numerically unstable under small regularization parameter. The second algorithm, based on area-convexity and dual extrapolation, improves the previously best-known convergence rates for the Wasserstein barycenter problem enjoying $widetilde O(mn^2/e)$ complexity.