No Arabic abstract
Optimal transport maps define a one-to-one correspondence between probability distributions, and as such have grown popular for machine learning applications. However, these maps are generally defined on empirical observations and cannot be generalized to new samples while preserving asymptotic properties. We extend a novel method to learn a consistent estimator of a continuous optimal transport map from two empirical distributions. The consequences of this work are two-fold: first, it enables to extend the transport plan to new observations without computing again the discrete optimal transport map; second, it provides statistical guarantees to machine learning applications of optimal transport. We illustrate the strength of this approach by deriving a consistent framework for transport-based counterfactual explanations in fairness.
Optimal Transport (OT) theory has seen an increasing amount of attention from the computer science community due to its potency and relevance in modeling and machine learning. It introduces means that serve as powerful ways to compare probability distributions with each other, as well as producing optimal mappings to minimize cost functions. In this survey, we present a brief introduction and history, a survey of previous work and propose directions of future study. We will begin by looking at the history of optimal transport and introducing the founders of this field. We then give a brief glance into the algorithms related to OT. Then, we will follow up with a mathematical formulation and the prerequisites to understand OT. These include Kantorovich duality, entropic regularization, KL Divergence, and Wassertein barycenters. Since OT is a computationally expensive problem, we then introduce the entropy-regularized version of computing optimal mappings, which allowed OT problems to become applicable in a wide range of machine learning problems. In fact, the methods generated from OT theory are competitive with the current state-of-the-art methods. We follow this up by breaking down research papers that focus on image processing, graph learning, neural architecture search, document representation, and domain adaptation. We close the paper with a small section on future research. Of the recommendations presented, three main problems are fundamental to allow OT to become widely applicable but rely strongly on its mathematical formulation and thus are hardest to answer. Since OT is a novel method, there is plenty of space for new research, and with more and more competitive methods (either on an accuracy level or computational speed level) being created, the future of applied optimal transport is bright as it has become pervasive in machine learning.
We present new results for consistency of maximum likelihood estimators with a focus on multivariate mixed models. Our theory builds on the idea of using subsets of the full data to establish consistency of estimators based on the full data. It requires neither that the data consist of independent observations, nor that the observations can be modeled as a stationary stochastic process. Compared to existing asymptotic theory using the idea of subsets we substantially weaken the assumptions, bringing them closer to what suffices in classical settings. We apply our theory in two multivariate mixed models for which it was unknown whether maximum likelihood estimators are consistent. The models we consider have non-stochastic predictors and multivariate responses which are possibly mixed-type (some discrete and some continuous).
We extend Hoeffdings lemma to general-state-space and not necessarily reversible Markov chains. Let ${X_i}_{i ge 1}$ be a stationary Markov chain with invariant measure $pi$ and absolute spectral gap $1-lambda$, where $lambda$ is defined as the operator norm of the transition kernel acting on mean zero and square-integrable functions with respect to $pi$. Then, for any bounded functions $f_i: x mapsto [a_i,b_i]$, the sum of $f_i(X_i)$ is sub-Gaussian with variance proxy $frac{1+lambda}{1-lambda} cdot sum_i frac{(b_i-a_i)^2}{4}$. This result differs from the classical Hoeffdings lemma by a multiplicative coefficient of $(1+lambda)/(1-lambda)$, and simplifies to the latter when $lambda = 0$. The counterpart of Hoeffdings inequality for Markov chains immediately follows. Our results assume none of countable state space, reversibility and time-homogeneity of Markov chains and cover time-dependent functions with various ranges. We illustrate the utility of these results by applying them to six problems in statistics and machine learning.
The aim of this paper is to extend Azzalinis method. This extension is done in two stages: consider two dependent and non-identically distributed random variables say $X_1$ and $X_2$; model the dependence between $X_1$ and $X_2$ by a copula. To illustrate the new method, we assume $X_1$ and $X_2$ are exponential random variables. This assumption leads to a new distribution called the Generalized Weighted Exponential Distribution (GWED), a generalization of Gupta and Kundu (2009)s Weighted Exponential Distribution (WED). Some mathematical properties of the GWED are derived, and its parameters estimated by maximum likelihood. The GWED is applied to biochemical data sets showing its good performance compared to the WED.
The dual problem of testing the predictive significance of a particular covariate, and identification of the set of relevant covariates is common in applied research and methodological investigations. To study this problem in the context of functional linear regression models with predictor variables observed over a grid and a scalar response, we consider basis expansions of the functional covariates and apply the likelihood ratio test. Based on p-values from testing each predictor, we propose a new variable selection method, which is consistent in selecting the relevant predictors from set of available predictors that is allowed to grow with the sample size n. Numerical simulations suggest that the proposed variable selection procedure outperforms existing methods found in the literature. A real dataset from weather stations in Japan is analyzed.