We study the multi-marginal partial optimal transport (POT) problem between $m$ discrete (unbalanced) measures with at most $n$ supports. We first prove that we can obtain two equivalence forms of the multimarginal POT problem in terms of the multima
rginal optimal transport problem via novel extensions of cost tensor. The first equivalence form is derived under the assumptions that the total masses of each measure are sufficiently close while the second equivalence form does not require any conditions on these masses but at the price of more sophisticated extended cost tensor. Our proof techniques for obtaining these equivalence forms rely on novel procedures of moving mass in graph theory to push transportation plan into appropriate regions. Finally, based on the equivalence forms, we develop optimization algorithm, named ApproxMPOT algorithm, that builds upon the Sinkhorn algorithm for solving the entropic regularized multimarginal optimal transport. We demonstrate that the ApproxMPOT algorithm can approximate the optimal value of multimarginal POT problem with a computational complexity upper bound of the order $tilde{mathcal{O}}(m^3(n+1)^{m}/ varepsilon^2)$ where $varepsilon > 0$ stands for the desired tolerance.
A common technique for compressing a neural network is to compute the $k$-rank $ell_2$ approximation $A_{k,2}$ of the matrix $Ainmathbb{R}^{ntimes d}$ that corresponds to a fully connected layer (or embedding layer). Here, $d$ is the number of the ne
urons in the layer, $n$ is the number in the next one, and $A_{k,2}$ can be stored in $O((n+d)k)$ memory instead of $O(nd)$. This $ell_2$-approximation minimizes the sum over every entry to the power of $p=2$ in the matrix $A - A_{k,2}$, among every matrix $A_{k,2}inmathbb{R}^{ntimes d}$ whose rank is $k$. While it can be computed efficiently via SVD, the $ell_2$-approximation is known to be very sensitive to outliers (far-away rows). Hence, machine learning uses e.g. Lasso Regression, $ell_1$-regularization, and $ell_1$-SVM that use the $ell_1$-norm. This paper suggests to replace the $k$-rank $ell_2$ approximation by $ell_p$, for $pin [1,2]$. We then provide practical and provable approximation algorithms to compute it for any $pgeq1$, based on modern techniques in computational geometry. Extensive experimental results on the GLUE benchmark for compressing BERT, DistilBERT, XLNet, and RoBERTa confirm this theoretical advantage. For example, our approach achieves $28%$ compression of RoBERTas embedding layer with only $0.63%$ additive drop in the accuracy (without fine-tuning) in average over all tasks in GLUE, compared to $11%$ drop using the existing $ell_2$-approximation. Open code is provided for reproducing and extending our results.
We provide a computational complexity analysis for the Sinkhorn algorithm that solves the entropic regularized Unbalanced Optimal Transport (UOT) problem between two measures of possibly different masses with at most $n$ components. We show that the
complexity of the Sinkhorn algorithm for finding an $varepsilon$-approximate solution to the UOT problem is of order $widetilde{mathcal{O}}(n^2/ varepsilon)$, which is near-linear time. To the best of our knowledge, this complexity is better than the complexity of the Sinkhorn algorithm for solving the Optimal Transport (OT) problem, which is of order $widetilde{mathcal{O}}(n^2/varepsilon^2)$. Our proof technique is based on the geometric convergence of the Sinkhorn updates to the optimal dual solution of the entropic regularized UOT problem and some properties of the primal solution. It is also different from the proof for the complexity of the Sinkhorn algorithm for approximating the OT problem since the UOT solution does not have to meet the marginal constraints.
Optimal Transport (OT) naturally arises in many machine learning applications, yet the heavy computational burden limits its wide-spread uses. To address the scalability issue, we propose an implicit generative learning-based framework called SPOT (S
calable Push-forward of Optimal Transport). Specifically, we approximate the optimal transport plan by a pushforward of a reference distribution, and cast the optimal transport problem into a minimax problem. We then can solve OT problems efficiently using primal dual stochastic gradient-type algorithms. We also show that we can recover the density of the optimal transport plan using neural ordinary differential equations. Numerical experiments on both synthetic and real datasets illustrate that SPOT is robust and has favorable convergence behavior. SPOT also allows us to efficiently sample from the optimal transport plan, which benefits downstream applications such as domain adaptation.
We introduce a learning-based algorithm for the low-rank decomposition problem: given an $n times d$ matrix $A$, and a parameter $k$, compute a rank-$k$ matrix $A$ that minimizes the approximation loss $|A-A|_F$. The algorithm uses a training set of
input matrices in order to optimize its performance. Specifically, some of the most efficient approximate algorithms for computing low-rank approximations proceed by computing a projection $SA$, where $S$ is a sparse random $m times n$ sketching matrix, and then performing the singular value decomposition of $SA$. We show how to replace the random matrix $S$ with a learned matrix of the same sparsity to reduce the error. Our experiments show that, for multiple types of data sets, a learned sketch matrix can substantially reduce the approximation loss compared to a random matrix $S$, sometimes by one order of magnitude. We also study mixed matrices where only some of the rows are trained and the remaining ones are random, and show that matrices still offer improved performance while retaining worst-case guarantees. Finally, to understand the theoretical aspects of our approach, we study the special case of $m=1$. In particular, we give an approximation algorithm for minimizing the empirical loss, with approximation factor depending on the stable rank of matrices in the training set. We also show generalization bounds for the sketch matrix learning problem.