High Dimensional Clustering with $r$-nets

87 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Georgia Avarikioti

تاريخ النشر 2018

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Georgia Avarikioti - Alain Ryser - Yuyi Wang

الهندسة الحسابية

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Clustering, a fundamental task in data science and machine learning, groups a set of objects in such a way that objects in the same cluster are closer to each other than to those in other clusters. In this paper, we consider a well-known structure, so-called $r$-nets, which rigorously captures the properties of clustering. We devise algorithms that improve the run-time of approximating $r$-nets in high-dimensional spaces with $ell_1$ and $ell_2$ metrics from $tilde{O}(dn^{2-Theta(sqrt{epsilon})})$ to $tilde{O}(dn + n^{2-alpha})$, where $alpha = Omega({epsilon^{1/3}}/{log(1/epsilon)})$. These algorithms are also used to improve a framework that provides approximate solutions to other high dimensional distance problems. Using this framework, several important related problems can also be solved efficiently, e.g., $(1+epsilon)$-approximate $k$th-nearest neighbor distance, $(4+epsilon)$-approximate Min-Max clustering, $(4+epsilon)$-approximate $k$-center clustering. In addition, we build an algorithm that $(1+epsilon)$-approximates greedy permutations in time $tilde{O}((dn + n^{2-alpha}) cdot log{Phi})$ where $Phi$ is the spread of the input. This algorithm is used to $(2+epsilon)$-approximate $k$-center with the same time complexity.

قيم البحث

140 - Orr Dunkelman , Zeev Geyzel , Chaya Keller 2020

In standard rounding, we want to map each value $X$ in a large continuous space (e.g., $R$) to a nearby point $P$ from a discrete subset (e.g., $Z$). This process seems to be inherently discontinuous in the sense that two consecutive noisy measuremen ts $X_1$ and $X_2$ of the same value may be extremely close to each other and yet they can be rounded to different points $P_1 e P_2$, which is undesirable in many applications. In this paper we show how to make the rounding process perfectly continuous in the sense that it maps any pair of sufficiently close measurements to the same point. We call such a process consistent rounding, and make it possible by allowing a small amount of information about the first measurement $X_1$ to be unidirectionally communicated to and used by the rounding process of $X_2$. The fault tolerance of a consistent rounding scheme is defined by the maximum distance between pairs of measurements which guarantees that they are always rounded to the same point, and our goal is to study the possible tradeoffs between the amount of information provided and the achievable fault tolerance for various types of spaces. When the measurements $X_i$ are arbitrary vectors in $R^d$, we show that communicating $log_2(d+1)$ bits of information is both sufficient and necessary (in the worst case) in order to achieve consistent rounding for some positive fault tolerance, and when d=3 we obtain a tight upper and lower asymptotic bound of $(0.561+o(1))k^{1/3}$ on the achievable fault tolerance when we reveal $log_2(k)$ bits of information about how $X_1$ was rounded. We analyze the problem by considering the possible colored tilings of the space with $k$ available colors, and obtain our upper and lower bounds with a variety of mathematical techniques including isoperimetric inequalities, the Brunn-Minkowski theorem, sphere packing bounds, and v{C}ech cohomology.

الهندسة الحسابية

A variational approach to high-order r-adaptation

45 - Julian Marcon , Michael Turner , David Moxey 2019

A variational framework, initially developed for high-order mesh optimisation, is being extended for r-adaptation. The method is based on the minimisation of a functional of the mesh deformation. To achieve adaptation, elements of the initial mesh ar e manipulated using metric tensors to obtain target elements. The nonlinear optimisation in turns adapts the final high-order mesh to best fit the description of the target elements by minimising the element distortion. Encouraging preliminary results prove that the method behaves well and can be used in the future for more extensive work which shall include the use of error indicators from CFD simulations.

الهندسة الحسابية الفيزياء الحسابية

Framework for $exists mathbb{R}$-Completeness of Two-Dimensional Packing Problems

86 - Mikkel Abrahamsen , Tillmann Miltzow , Nadja Seiferth 2020

We show that many natural two-dimensional packing problems are algorithmically equivalent to finding real roots of multivariate polynomials. A two-dimensional packing problem is defined by the type of pieces, containers, and motions that are allowed. The aim is to decide if a given set of pieces can be placed inside a given container. The pieces must be placed so that they are pairwise interior-disjoint, and only motions of the allowed type can be used to move them there. We establish a framework which enables us to show that for many combinations of allowed pieces, containers, and motions, the resulting problem is $existsmathbb R$-complete. This means that the problem is equivalent (under polynomial time reductions) to deciding whether a given system of polynomial equations and inequalities with integer coefficients has a real solution. We consider packing problems where only translations are allowed as the motions, and problems where arbitrary rigid motions are allowed, i.e., both translations and rotations. When rotations are allowed, we show that the following combinations of allowed pieces and containers are $existsmathbb R$-complete: $bullet$ simple polygons, each of which has at most 8 corners, into a square, $bullet$ convex objects bounded by line segments and hyperbolic curves into a square, $bullet$ convex polygons into a container bounded by line segments and hyperbolic curves. Restricted to translations, we show that the following combinations are $existsmathbb R$-complete: $bullet$ objects bounded by segments and hyperbolic curves into a square, $bullet$ convex polygons into a container bounded by segments and hyperbolic curves.

الهندسة الحسابية التعقيد الحسابي الرياضيات المتقطعة

Discrete Voronoi Games and $epsilon$-Nets, in Two and Three Dimensions

130 - Aritra Banik , Jean-Lou De Carufel , Anil Maheshwari 2015

The one-round discrete Voronoi game, with respect to a $n$-point user set $U$, consists of two players Player 1 ($mathcal{P}_1$) and Player 2 ($mathcal{P}_2$). At first, $mathcal{P}_1$ chooses a set of facilities $F_1$ following which $mathcal{P}_2$ chooses another set of facilities $F_2$, disjoint from $F_1$. The payoff of $mathcal{P}_2$ is defined as the cardinality of the set of points in $U$ which are closer to a facility in $F_2$ than to every facility in $F_1$, and the payoff of $mathcal{P}_1$ is the difference between the number of users in $U$ and the payoff of $mathcal{P}_2$. The objective of both the players in the game is to maximize their respective payoffs. In this paper we study the one-round discrete Voronoi game where $mathcal{P}_1$ places $k$ facilities and $mathcal{P}_2$ places one facility and we have denoted this game as $VG(k,1)$. Although the optimal solution of this game can be found in polynomial time, the polynomial has a very high degree. In this paper, we focus on achieving approximate solutions to $VG(k,1)$ with significantly better running times. We provide a constant-factor approximate solution to the optimal strategy of $mathcal{P}_1$ in $VG(k,1)$ by establishing a connection between $VG(k,1)$ and weak $epsilon$-nets. To the best of our knowledge, this is the first time that Voronoi games are studied from the point of view of $epsilon$-nets.

الهندسة الحسابية

Clustering High Dimensional Dynamic Data Streams

138 - Vladimir Braverman , Gereon Frahling , Harry Lang 2017

We present data streaming algorithms for the $k$-median problem in high-dimensional dynamic geometric data streams, i.e. streams allowing both insertions and deletions of points from a discrete Euclidean space ${1, 2, ldots Delta}^d$. Our algorithms use $k epsilon^{-2} poly(d log Delta)$ space/time and maintain with high probability a small weighted set of points (a coreset) such that for every set of $k$ centers the cost of the coreset $(1+epsilon)$-approximates the cost of the streamed point set. We also provide algorithms that guarantee only positive weights in the coreset with additional logarithmic factors in the space and time complexities. We can use this positively-weighted coreset to compute a $(1+epsilon)$-approximation for the $k$-median problem by any efficient offline $k$-median algorithm. All previous algorithms for computing a $(1+epsilon)$-approximation for the $k$-median problem over dynamic data streams required space and time exponential in $d$. Our algorithms can be generalized to metric spaces of bounded doubling dimension.

بنى وهياكل البيانات والخوارزميات

سجل دخول لتتمكن من نشر تعليقات

التعليقات

جاري جلب التعليقات

سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها

جامعة البعث

تفاصيل إضافية المزيد من الجامعات

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

High Dimensional Clustering with $r$-nets

اسأل ChatGPT حول البحث

ﻻ يوجد ملخص باللغة العربية

اقرأ أيضاً