No Arabic abstract
Regularization in Optimal Transport (OT) problems has been shown to critically affect the associated computational and sample complexities. It also has been observed that regularization effectively helps in handling noisy marginals as well as marginals with unequal masses. However, existing works on OT restrict themselves to $phi$-divergences based regularization. In this work, we propose and analyze Integral Probability Metric (IPM) based regularization in OT problems. While it is expected that the well-established advantages of IPMs are inherited by the IPM-regularized OT variants, we interestingly observe that some useful aspects of $phi$-regularization are preserved. For example, we show that the OT formulation, where the marginal constraints are relaxed using IPM-regularization, also lifts the ground metric to that over (perhaps un-normalized) measures. Infact, the lifted metric turns out to be another IPM whose generating set is the intersection of that of the IPM employed for regularization and the set of 1-Lipschitz functions under the ground metric. Also, in the special case where the regularization is squared maximum mean discrepancy based, the proposed OT variant, as well as the corresponding Barycenter formulation, turn out to be those of minimizing a convex quadratic subject to non-negativity/simplex constraints and hence can be solved efficiently. Simulations confirm that the optimal transport plans/maps obtained with IPM-regularization are intrinsically different from those obtained with $phi$-regularization. Empirical results illustrate the efficacy of the proposed IPM-regularized OT formulation. This draft contains the main paper and the Appendices.
We propose the use of Flat Metric to assess the performance of reconstruction methods for single-molecule localization microscopy (SMLM) in scenarios where the ground-truth is available. Flat Metric is intimately related to the concept of optimal transport between measures of different mass, providing solid mathematical foundations for SMLM evaluation and integrating both localization and detection performance. In this paper, we provide the foundations of Flat Metric and validate this measure by applying it to controlled synthetic examples and to data from the SMLM 2016 Challenge.
Mixup is a popular regularization technique for training deep neural networks that can improve generalization and increase adversarial robustness. It perturbs input training data in the direction of other randomly-chosen instances in the training set. To better leverage the structure of the data, we extend mixup to emph{$k$-mixup} by perturbing $k$-batches of training points in the direction of other $k$-batches using displacement interpolation, interpolation under the Wasserstein metric. We demonstrate theoretically and in simulations that $k$-mixup preserves cluster and manifold structures, and we extend theory studying efficacy of standard mixup. Our empirical results show that training with $k$-mixup further improves generalization and robustness on benchmark datasets.
We study the free probabilistic analog of optimal couplings for the quadratic cost, where classical probability spaces are replaced by tracial von Neumann algebras and probability measures on $mathbb{R}^m$ are replaced by non-commutative laws of $m$-tuples. We prove an analog of the Monge-Kantorovich duality which characterizes optimal couplings of non-commutative laws with respect to Biane and Voiculescus non-commutative $L^2$-Wasserstein distance using a new type of convex functions. As a consequence, we show that if $(X,Y)$ is a pair of optimally coupled $m$-tuples of non-commutative random variables in a tracial $mathrm{W}^*$-algebra $mathcal{A}$, then $mathrm{W}^*((1 - t)X + tY) = mathrm{W}^*(X,Y)$ for all $t in (0,1)$. Finally, we illustrate the subtleties of non-commutative optimal couplings through connections with results in quantum information theory and operator algebras. For instance, two non-commutative laws that can be realized in finite-dimensional algebras may still require an infinite-dimensional algebra to optimally couple. Moreover, the space of non-commutative laws of $m$-tuples is not separable with respect to the Wasserstein distance for $m > 1$.
We provide a survey of recent results on model calibration by Optimal Transport. We present the general framework and then discuss the calibration of local, and local-stochastic, volatility models to European options, the joint VIX/SPX calibration problem as well as calibration to some path-dependent options. We explain the numerical algorithms and present examples both on synthetic and market data.
Several recent publications report advances in training optimal decision trees (ODT) using mixed-integer programs (MIP), due to algorithmic advances in integer programming and a growing interest in addressing the inherent suboptimality of heuristic approaches such as CART. In this paper, we propose a novel MIP formulation, based on a 1-norm support vector machine model, to train a multivariate ODT for classification problems. We provide cutting plane techniques that tighten the linear relaxation of the MIP formulation, in order to improve run times to reach optimality. Using 36 data-sets from the University of California Irvine Machine Learning Repository, we demonstrate that our formulation outperforms its counterparts in the literature by an average of about 10% in terms of mean out-of-sample testing accuracy across the data-sets. We provide a scalable framework to train multivariate ODT on large data-sets by introducing a novel linear programming (LP) based data selection method to choose a subset of the data for training. Our method is able to routinely handle large data-sets with more than 7,000 sample points and outperform heuristics methods and other MIP based techniques. We present results on data-sets containing up to 245,000 samples. Existing MIP-based methods do not scale well on training data-sets beyond 5,500 samples.