No Arabic abstract
In this work, we provide faster algorithms for approximating the optimal transport distance, e.g. earth movers distance, between two discrete probability distributions $mu, u in Delta^n$. Given a cost function $C : [n] times [n] to mathbb{R}_{geq 0}$ where $C(i,j) leq 1$ quantifies the penalty of transporting a unit of mass from $i$ to $j$, we show how to compute a coupling $X$ between $r$ and $c$ in time $widetilde{O}left(n^2 /epsilon right)$ whose expected transportation cost is within an additive $epsilon$ of optimal. This improves upon the previously best known running time for this problem of $widetilde{O}left(text{min}left{ n^{9/4}/epsilon, n^2/epsilon^2 right}right)$. We achieve our results by providing reductions from optimal transport to canonical optimization problems for which recent algorithmic efforts have provided nearly-linear time algorithms. Leveraging nearly linear time algorithms for solving packing linear programs and for solving the matrix balancing problem, we obtain two separate proofs of our stated running time. Further, one of our algorithms is easily parallelized and can be implemented with depth $widetilde{O}(1/epsilon)$. Moreover, we show that further algorithmic improvements to our result would be surprising in the sense that any improvement would yield an $o(n^{2.5})$ algorithm for textit{maximum cardinality bipartite matching}, for which currently the only known algorithms for achieving such a result are based on fast-matrix multiplication.
We consider the problem of sampling and approximately counting an arbitrary given motif $H$ in a graph $G$, where access to $G$ is given via queries: degree, neighbor, and pair, as well as uniform edge sample queries. Previous algorithms for these tasks were based on a decomposition of $H$ into a collection of odd cycles and stars, denoted $mathcal{D}^*(H)={O_{k_1}, ldots, O_{k_q}, S_{p_1}, ldots, S_{p_ell}}$. These algorithms were shown to be optimal for the case where $H$ is a clique or an odd-length cycle, but no other lower bounds were known. We present a new algorithm for sampling and approximately counting arbitrary motifs which, up to $textrm{poly}(log n)$ factors, is always at least as good as previous results, and for most graphs $G$ is strictly better. The main ingredient leading to this improvement is an improved uniform algorithm for sampling stars, which might be of independent interest, as it allows to sample vertices according to the $p$-th moment of the degree distribution. Finally, we prove that this algorithm is emph{decomposition-optimal} for decompositions that contain at least one odd cycle. These are the first lower bounds for motifs $H$ with a nontrivial decomposition, i.e., motifs that have more than a single component in their decomposition.
We consider a given region $Omega$ where the traffic flows according to two regimes: in a region $C$ we have a low congestion, where in the remaining part $Omegasetminus C$ the congestion is higher. The two congestion functions $H_1$ and $H_2$ are given, but the region $C$ has to be determined in an optimal way in order to minimize the total transportation cost. Various penalization terms on $C$ are considered and some numerical computations are shown.
We provide a survey of recent results on model calibration by Optimal Transport. We present the general framework and then discuss the calibration of local, and local-stochastic, volatility models to European options, the joint VIX/SPX calibration problem as well as calibration to some path-dependent options. We explain the numerical algorithms and present examples both on synthetic and market data.
We study the store-and-forward packet routing problem for simultaneous multicasts, in which multiple packets have to be forwarded along given trees as fast as possible. This is a natural generalization of the seminal work of Leighton, Maggs and Rao, which solved this problem for unicasts, i.e. the case where all trees are paths. They showed the existence of asymptotically optimal $O(C + D)$-length schedules, where the congestion $C$ is the maximum number of packets sent over an edge and the dilation $D$ is the maximum depth of a tree. This improves over the trivial $O(CD)$ length schedules. We prove a lower bound for multicasts, which shows that there do not always exist schedules of non-trivial length, $o(CD)$. On the positive side, we construct $O(C+D+log^2 n)$-length schedules in any $n$-node network. These schedules are near-optimal, since our lower bound shows that this length cannot be improved to $O(C+D) + o(log n)$.
In this paper we study the fundamental problem of maintaining a dynamic collection of strings under the following operations: concat - concatenates two strings, split - splits a string into two at a given position, compare - finds the lexicographical order (less, equal, greater) between two strings, LCP - calculates the longest common prefix of two strings. We present an efficient data structure for this problem, where an update requires only $O(log n)$ worst-case time with high probability, with $n$ being the total length of all strings in the collection, and a query takes constant worst-case time. On the lower bound side, we prove that even if the only possible query is checking equality of two strings, either updates or queries take amortized $Omega(log n)$ time; hence our implementation is optimal. Such operations can be used as a basic building block to solve other string problems. We provide two examples. First, we can augment our data structure to provide pattern matching queries that may locate occurrences of a specified pattern $p$ in the strings in our collection in optimal $O(|p|)$ time, at the expense of increasing update time to $O(log^2 n)$. Second, we show how to maintain a history of an edited text, processing updates in $O(log t log log t)$ time, where $t$ is the number of edits, and how to support pattern matching queries against the whole history in $O(|p| log t log log t)$ time. Finally, we note that our data structure can be applied to test dynamic tree isomorphism and to compare strings generated by dynamic straight-line grammars.