No Arabic abstract
We study approximation algorithms for variants of the emph{median string} problem, which asks for a string that minimizes the sum of edit distances from a given set of $m$ strings of length $n$. Only the straightforward $2$-approximation is known for this NP-hard problem. This problem is motivated e.g.~by computational biology, and belongs to the class of median problems (over different metric spaces), which are fundamental tasks in data analysis. Our main result is for the Ulam metric, where all strings are permutations over $[n]$ and each edit operation moves a symbol (deletion plus insertion). We devise for this problem an algorithms that breaks the $2$-approximation barrier, i.e., computes a $(2-delta)$-approximate median permutation for some constant $delta>0$ in time $tilde{O}(nm^2+n^3)$. We further use these techniques to achieve a $(2-delta)$ approximation for the median string problem in the special case where the median is restricted to length $n$ and the optimal objective is large $Omega(mn)$. We also design an approximation algorithm for the following probabilistic model of the Ulam median: the input consists of $m$ perturbations of an (unknown) permutation $x$, each generated by moving every symbol to a random position with probability (a parameter) $epsilon>0$. Our algorithm computes with high probability a $(1+o(1/epsilon))$-approximate median permutation in time $O(mn^2+n^3)$.
In 2015, Driemel, Krivov{s}ija and Sohler introduced the $(k,ell)$-median problem for clustering polygonal curves under the Frechet distance. Given a set of input curves, the problem asks to find $k$ median curves of at most $ell$ vertices each that minimize the sum of Frechet distances over all input curves to their closest median curve. A major shortcoming of their algorithm is that the input curves are restricted to lie on the real line. In this paper, we present a randomized bicriteria-approximation algorithm that works for polygonal curves in $mathbb{R}^d$ and achieves approximation factor $(1+epsilon)$ with respect to the clustering costs. The algorithm has worst-case running-time linear in the number of curves, polynomial in the maximum number of vertices per curve, i.e. their complexity, and exponential in $d$, $ell$, $epsilon$ and $delta$, i.e., the failure probability. We achieve this result through a shortcutting lemma, which guarantees the existence of a polygonal curve with similar cost as an optimal median curve of complexity $ell$, but of complexity at most $2ell-2$, and whose vertices can be computed efficiently. We combine this lemma with the superset-sampling technique by Kumar et al. to derive our clustering result. In doing so, we describe and analyze a generalization of the algorithm by Ackermann et al., which may be of independent interest.
The chaotic low energy region of the Fermi-Ulam simplified accelerator model is characterised by use of scaling analysis. It is shown that the average velocity and the roughness (variance of the average velocity) obey scaling functions with the same characteristic exponents. The formalism is widely applicable, including to billiards and to other chaotic systems.
We present a randomized approximation scheme for the permanent of a matrix with nonnegative entries. Our scheme extends a recursive rejection sampling method of Huber and Law (SODA 2008) by replacing the upper bound for the permanent with a linear combination of the subproblem bounds at a moderately large depth of the recursion tree. This method, we call deep rejection sampling, is empirically shown to outperform the basic, depth-zero variant, as well as a related method by Kuck et al. (NeurIPS 2019). We analyze the expected running time of the scheme on random $(0, 1)$-matrices where each entry is independently $1$ with probability $p$. Our bound is superior to a previous one for $p$ less than $1/5$, matching another bound that was known to hold when every row and column has density exactly $p$.
Thickenings of a metric space capture local geometric properties of the space. Here we exhibit applications of lower bounding the topology of thickenings of the circle and more generally the sphere. We explain interconnections with the geometry of circle actions on Euclidean space, the structure of zeros of trigonometric polynomials, and theorems of Borsuk-Ulam type. We use the combinatorial and geometric structure of the convex hull of orbits of circle actions on Euclidean space to give geometric proofs of the homotopy type of metric thickenings of the circle. Homotopical connectivity bounds of thickenings of the sphere allow us to prove that a weighted average of function values of odd maps $S^n to mathbb{R}^{n+2}$ on a small diameter set is zero. We prove an additional generalization of the Borsuk-Ulam theorem for odd maps $S^{2n-1} to mathbb{R}^{2kn+2n-1}$. We prove such results for odd maps from the circle to any Euclidean space with optimal quantitative bounds. This in turn implies that any raked homogeneous trigonometric polynomial has a zero on a subset of the circle of a specific diameter; these results are optimal.
This work shows that the following problems are equivalent, both in theory and in practice: - median filtering: given an $n$-element vector, compute the sliding window median with window size $k$, - piecewise sorting: given an $n$-element vector, divide it in $n/k$ blocks of length $k$ and sort each block. By prior work, median filtering is known to be at least as hard as piecewise sorting: with a single median filter operation we can sort $Theta(n/k)$ blocks of length $Theta(k)$. The present work shows that median filtering is also as easy as piecewise sorting: we can do median filtering with one piecewise sorting operation and linear-time postprocessing. In particular, median filtering can directly benefit from the vast literature on sorting algorithms---for example, adaptive sorting algorithms imply adaptive median filtering algorithms. The reduction is very efficient in practice---for random inputs the performance of the new sorting-based algorithm is on a par with the fastest heap-based algorithms, and for benign data distributions it typically outperforms prior algorithms. The key technical idea is that we can represent the sliding window with a pair of sorted doubly-linked lists: we delete items from one list and add items to the other list. Deletions are easy; additions can be done efficiently if we reverse the time twice: First we construct the full list and delete the items in the reverse order. Then we undo each deletion with Knuths dancing links technique.