ترغب بنشر مسار تعليمي؟ اضغط هنا

Approximate Clustering via Metric Partitioning

128   0   0.0 ( 0 )
 نشر من قبل Sayan Bandyapadhyay
 تاريخ النشر 2015
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

In this paper we consider two metric covering/clustering problems - textit{Minimum Cost Covering Problem} (MCC) and $k$-clustering. In the MCC problem, we are given two point sets $X$ (clients) and $Y$ (servers), and a metric on $X cup Y$. We would like to cover the clients by balls centered at the servers. The objective function to minimize is the sum of the $alpha$-th power of the radii of the balls. Here $alpha geq 1$ is a parameter of the problem (but not of a problem instance). MCC is closely related to the $k$-clustering problem. The main difference between $k$-clustering and MCC is that in $k$-clustering one needs to select $k$ balls to cover the clients. For any $eps > 0$, we describe quasi-polynomial time $(1 + eps)$ approximation algorithms for both of the problems. However, in case of $k$-clustering the algorithm uses $(1 + eps)k$ balls. Prior to our work, a $3^{alpha}$ and a ${c}^{alpha}$ approximation were achieved by polynomial-time algorithms for MCC and $k$-clustering, respectively, where $c > 1$ is an absolute constant. These two problems are thus interesting examples of metric covering/clustering problems that admit $(1 + eps)$-approximation (using $(1+eps)k$ balls in case of $k$-clustering), if one is willing to settle for quasi-polynomial time. In contrast, for the variant of MCC where $alpha$ is part of the input, we show under standard assumptions that no polynomial time algorithm can achieve an approximation factor better than $O(log |X|)$ for $alpha geq log |X|$.



قيم البحث

اقرأ أيضاً

Solomon and Elkin constructed a shortcutting scheme for weighted trees which results in a 1-spanner for the tree metric induced by the input tree. The spanner has logarithmic lightness, logarithmic diameter, a linear number of edges and bounded degre e (provided the input tree has bounded degree). This spanner has been applied in a series of papers devoted to designing bounded degree, low-diameter, low-weight $(1+epsilon)$-spanners in Euclidean and doubling metrics. In this paper, we present a simple local routing algorithm for this tree metric spanner. The algorithm has a routing ratio of 1, is guaranteed to terminate after $O(log n)$ hops and requires $O(Delta log n)$ bits of storage per vertex where $Delta$ is the maximum degree of the tree on which the spanner is constructed. This local routing algorithm can be adapted to a local routing algorithm for a doubling metric spanner which makes use of the shortcutting scheme.
textit{Clustering problems} often arise in the fields like data mining, machine learning etc. to group a collection of objects into similar groups with respect to a similarity (or dissimilarity) measure. Among the clustering problems, specifically te xtit{$k$-means} clustering has got much attention from the researchers. Despite the fact that $k$-means is a very well studied problem its status in the plane is still an open problem. In particular, it is unknown whether it admits a PTAS in the plane. The best known approximation bound in polynomial time is $9+eps$. In this paper, we consider the following variant of $k$-means. Given a set $C$ of points in $mathcal{R}^d$ and a real $f > 0$, find a finite set $F$ of points in $mathcal{R}^d$ that minimizes the quantity $f*|F|+sum_{pin C} min_{q in F} {||p-q||}^2$. For any fixed dimension $d$, we design a local search PTAS for this problem. We also give a bi-criterion local search algorithm for $k$-means which uses $(1+eps)k$ centers and yields a solution whose cost is at most $(1+eps)$ times the cost of an optimal $k$-means solution. The algorithm runs in polynomial time for any fixed dimension. The contribution of this paper is two fold. On the one hand, we are being able to handle the square of distances in an elegant manner, which yields near optimal approximation bound. This leads us towards a better understanding of the $k$-means problem. On the other hand, our analysis of local search might also be useful for other geometric problems. This is important considering that very little is known about the local search method for geometric approximation.
In 2015, Driemel, Krivov{s}ija and Sohler introduced the $(k,ell)$-median problem for clustering polygonal curves under the Frechet distance. Given a set of input curves, the problem asks to find $k$ median curves of at most $ell$ vertices each that minimize the sum of Frechet distances over all input curves to their closest median curve. A major shortcoming of their algorithm is that the input curves are restricted to lie on the real line. In this paper, we present a randomized bicriteria-approximation algorithm that works for polygonal curves in $mathbb{R}^d$ and achieves approximation factor $(1+epsilon)$ with respect to the clustering costs. The algorithm has worst-case running-time linear in the number of curves, polynomial in the maximum number of vertices per curve, i.e. their complexity, and exponential in $d$, $ell$, $epsilon$ and $delta$, i.e., the failure probability. We achieve this result through a shortcutting lemma, which guarantees the existence of a polygonal curve with similar cost as an optimal median curve of complexity $ell$, but of complexity at most $2ell-2$, and whose vertices can be computed efficiently. We combine this lemma with the superset-sampling technique by Kumar et al. to derive our clustering result. In doing so, we describe and analyze a generalization of the algorithm by Ackermann et al., which may be of independent interest.
Hierarchical clustering is a critical task in numerous domains. Many approaches are based on heuristics and the properties of the resulting clusterings are studied post hoc. However, in several applications, there is a natural cost function that can be used to characterize the quality of the clustering. In those cases, hierarchical clustering can be seen as a combinatorial optimization problem. To that end, we introduce a new approach based on A* search. We overcome the prohibitively large search space by combining A* with a novel emph{trellis} data structure. This combination results in an exact algorithm that scales beyond previous state of the art, from a search space with $10^{12}$ trees to $10^{15}$ trees, and an approximate algorithm that improves over baselines, even in enormous search spaces that contain more than $10^{1000}$ trees. We empirically demonstrate that our method achieves substantially higher quality results than baselines for a particle physics use case and other clustering benchmarks. We describe how our method provides significantly improved theoretical bounds on the time and space complexity of A* for clustering.
In this paper, we report progress on answering the open problem presented by Pagh~[14], who considered the nearest neighbor search without false negatives for the Hamming distance. We show new data structures for solving the $c$-approximate nearest n eighbors problem without false negatives for Euclidean high dimensional space $mathcal{R}^d$. These data structures work for any $c = omega(sqrt{log{log{n}}})$, where $n$ is the number of points in the input set, with poly-logarithmic query time and polynomial preprocessing time. This improves over the known algorithms, which require $c$ to be $Omega(sqrt{d})$. This improvement is obtained by applying a sequence of reductions, which are interesting on their own. First, we reduce the problem to $d$ instances of dimension logarithmic in $n$. Next, these instances are reduced to a number of $c$-approximate nearest neighbor search instances in $big(mathbb{R}^kbig)^L$ space equipped with metric $m(x,y) = max_{1 le i le L}(lVert x_i - y_irVert_2)$.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا