No Arabic abstract
We study a discrete version of a geometric stable marriage problem originally proposed in a continuous setting by Hoffman, Holroyd, and Peres, in which points in the plane are stably matched to cluster centers, as prioritized by their distances, so that each cluster center is apportioned a set of points of equal area. We show that, for a discretization of the problem to an $ntimes n$ grid of pixels with $k$ centers, the problem can be solved in time $O(n^2 log^5 n)$, and we experiment with two slower but more practical algorithms and a hybrid method that switches from one of these algorithms to the other to gain greater efficiency than either algorithm alone. We also show how to combine geometric stable matchings with a $k$-means clustering algorithm, so as to provide a geometric political-districting algorithm that views distance in economic terms, and we experiment with weight
In the past few years powerful generalizations to the Euclidean k-means problem have been made, such as Bregman clustering [7], co-clustering (i.e., simultaneous clustering of rows and columns of an input matrix) [9,18], and tensor clustering [8,34]. Like k-means, these more general problems also suffer from the NP-hardness of the associated optimization. Researchers have developed approximation algorithms of varying degrees of sophistication for k-means, k-medians, and more recently also for Bregman clustering [2]. However, there seem to be no approximation algorithms for Bregman co- and tensor clustering. In this paper we derive the first (to our knowledge) guaranteed methods for these increasingly important clustering settings. Going beyond Bregman divergences, we also prove an approximation factor for tensor clustering with arbitrary separable metrics. Through extensive experiments we evaluate the characteristics of our method, and show that it also has practical impact.
This paper presents universal algorithms for clustering problems, including the widely studied $k$-median, $k$-means, and $k$-center objectives. The input is a metric space containing all potential client locations. The algorithm must select $k$ cluster centers such that they are a good solution for any subset of clients that actually realize. Specifically, we aim for low regret, defined as the maximum over all subsets of the difference between the cost of the algorithms solution and that of an optimal solution. A universal algorithms solution $SOL$ for a clustering problem is said to be an $(alpha, beta)$-approximation if for all subsets of clients $C$, it satisfies $SOL(C) leq alpha cdot OPT(C) + beta cdot MR$, where $OPT(C)$ is the cost of the optimal solution for clients $C$ and $MR$ is the minimum regret achievable by any solution. Our main results are universal algorithms for the standard clustering objectives of $k$-median, $k$-means, and $k$-center that achieve $(O(1), O(1))$-approximations. These results are obtained via a novel framework for universal algorithms using linear programming (LP) relaxations. These results generalize to other $ell_p$-objectives and the setting where some subset of the clients are fixed. We also give hardness results showing that $(alpha, beta)$-approximation is NP-hard if $alpha$ or $beta$ is at most a certain constant, even for the widely studied special case of Euclidean metric spaces. This shows that in some sense, $(O(1), O(1))$-approximation is the strongest type of guarantee obtainable for universal clustering.
We consider the problem of finding textit{semi-matching} in bipartite graphs which is also extensively studied under various names in the scheduling literature. We give faster algorithms for both weighted and unweighted case. For the weighted case, we give an $O(nmlog n)$-time algorithm, where $n$ is the number of vertices and $m$ is the number of edges, by exploiting the geometric structure of the problem. This improves the classical $O(n^3)$ algorithms by Horn [Operations Research 1973] and Bruno, Coffman and Sethi [Communications of the ACM 1974]. For the unweighted case, the bound could be improved even further. We give a simple divide-and-conquer algorithm which runs in $O(sqrt{n}mlog n)$ time, improving two previous $O(nm)$-time algorithms by Abraham [MSc thesis, University of Glasgow 2003] and Harvey, Ladner, Lovasz and Tamir [WADS 2003 and Journal of Algorithms 2006]. We also extend this algorithm to solve the textit{Balance Edge Cover} problem in $O(sqrt{n}mlog n)$ time, improving the previous $O(nm)$-time algorithm by Harada, Ono, Sadakane and Yamashita [ISAAC 2008].
Let $A$ and $B$ be two point sets in the plane of sizes $r$ and $n$ respectively (assume $r leq n$), and let $k$ be a parameter. A matching between $A$ and $B$ is a family of pairs in $A times B$ so that any point of $A cup B$ appears in at most one pair. Given two positive integers $p$ and $q$, we define the cost of matching $M$ to be $c(M) = sum_{(a, b) in M}|{a-b}|_p^q$ where $|{cdot}|_p$ is the $L_p$-norm. The geometric partial matching problem asks to find the minimum-cost size-$k$ matching between $A$ and $B$. We present efficient algorithms for geometric partial matching problem that work for any powers of $L_p$-norm matching objective: An exact algorithm that runs in $O((n + k^2) {mathop{mathrm{polylog}}} n)$ time, and a $(1 + varepsilon)$-approximation algorithm that runs in $O((n + ksqrt{k}) {mathop{mathrm{polylog}}} n cdot logvarepsilon^{-1})$ time. Both algorithms are based on the primal-dual flow augmentation scheme; the main improvements involve using dynamic data structures to achieve efficient flow augmentations. With similar techniques, we give an exact algorithm for the planar transportation problem running in $O(min{n^2, rn^{3/2}} {mathop{mathrm{polylog}}} n)$ time.
For over a decade now we have been witnessing the success of {em massive parallel computation} (MPC) frameworks, such as MapReduce, Hadoop, Dryad, or Spark. One of the reasons for their success is the fact that these frameworks are able to accurately capture the nature of large-scale computation. In particular, compared to the classic distributed algorithms or PRAM models, these frameworks allow for much more local computation. The fundamental question that arises in this context is though: can we leverage this additional power to obtain even faster parallel algorithms? A prominent example here is the {em maximum matching} problem---one of the most classic graph problems. It is well known that in the PRAM model one can compute a 2-approximate maximum matching in $O(log{n})$ rounds. However, the exact complexity of this problem in the MPC framework is still far from understood. Lattanzi et al. showed that if each machine has $n^{1+Omega(1)}$ memory, this problem can also be solved $2$-approximately in a constant number of rounds. These techniques, as well as the approaches developed in the follow up work, seem though to get stuck in a fundamental way at roughly $O(log{n})$ rounds once we enter the near-linear memory regime. It is thus entirely possible that in this regime, which captures in particular the case of sparse graph computations, the best MPC round complexity matches what one can already get in the PRAM model, without the need to take advantage of the extra local computation power. In this paper, we finally refute that perplexing possibility. That is, we break the above $O(log n)$ round complexity bound even in the case of {em slightly sublinear} memory per machine. In fact, our improvement here is {em almost exponential}: we are able to deliver a $(2+epsilon)$-approximation to maximum matching, for any fixed constant $epsilon>0$, in $O((log log n)^2)$ rounds.