ﻻ يوجد ملخص باللغة العربية
A classic problem in unsupervised learning and data analysis is to find simpler and easy-to-visualize representations of the data that preserve its essential properties. A widely-used method to preserve the underlying hierarchical structure of the data while reducing its complexity is to find an embedding of the data into a tree or an ultrametric. The most popular algorithms for this task are the classic linkage algorithms (single, average, or complete). However, these methods on a data set of $n$ points in $Omega(log n)$ dimensions exhibit a quite prohibitive running time of $Theta(n^2)$. In this paper, we provide a new algorithm which takes as input a set of points $P$ in $mathbb{R}^d$, and for every $cge 1$, runs in time $n^{1+frac{rho}{c^2}}$ (for some universal constant $rho>1$) to output an ultrametric $Delta$ such that for any two points $u,v$ in $P$, we have $Delta(u,v)$ is within a multiplicative factor of $5c$ to the distance between $u$ and $v$ in the best ultrametric representation of $P$. Here, the best ultrametric is the ultrametric $tildeDelta$ that minimizes the maximum distance distortion with respect to the $ell_2$ distance, namely that minimizes $underset{u,v in P}{max} frac{tildeDelta(u,v)}{|u-v|_2}$. We complement the above result by showing that under popular complexity theoretic assumptions, for every constant $varepsilon>0$, no algorithm with running time $n^{2-varepsilon}$ can distinguish between inputs in $ell_infty$-metric that admit isometric embedding and those that incur a distortion of $frac{3}{2}$. Finally, we present empirical evaluation on classic machine learning datasets and show that the output of our algorithm is comparable to the output of the linkage algorithms while achieving a much faster running time.
We consider minimum-cardinality Manhattan connected sets with arbitrary demands: Given a collection of points $P$ in the plane, together with a subset of pairs of points in $P$ (which we call demands), find a minimum-cardinality superset of $P$ such
Product measures of dimension $n$ are known to be concentrated in Hamming distance: for any set $S$ in the product space of probability $epsilon$, a random point in the space, with probability $1-delta$, has a neighbor in $S$ that is different from t
We present algorithmic results for the parallel assembly of many micro-scale objects in two and three dimensions from tiny particles, which has been proposed in the context of programmable matter and self-assembly for building high-yield micro-factor
We study the space complexity of solving the bias-regularized SVM problem in the streaming model. This is a classic supervised learning problem that has drawn lots of attention, including for developing fast algorithms for solving the problem approxi
A recent series of papers by Andoni, Naor, Nikolov, Razenshteyn, and Waingarten (STOC 2018, FOCS 2018) has given approximate near neighbour search (NNS) data structures for a wide class of distance metrics, including all norms. In particular, these d