ﻻ يوجد ملخص باللغة العربية
Many clustering algorithms are guided by certain cost functions such as the widely-used $k$-means cost. These algorithms divide data points into clusters with often complicated boundaries, creating difficulties in explaining the clustering decision. In a recent work, Dasgupta, Frost, Moshkovitz, and Rashtchian (ICML20) introduced explainable clustering, where the cluster boundaries are axis-parallel hyperplanes and the clustering is obtained by applying a decision tree to the data. The central question here is: how much does the explainability constraint increase the value of the cost function? Given $d$-dimensional data points, we show an efficient algorithm that finds an explainable clustering whose $k$-means cost is at most $k^{1 - 2/d}mathrm{poly}(dlog k)$ times the minimum cost achievable by a clustering without the explainability constraint, assuming $k,dge 2$. Combining this with an independent work by Makarychev and Shan (ICML21), we get an improved bound of $k^{1 - 2/d}mathrm{polylog}(k)$, which we show is optimal for every choice of $k,dge 2$ up to a poly-logarithmic factor in $k$. For $d = 2$ in particular, we show an $O(log kloglog k)$ bound, improving exponentially over the previous best bound of $widetilde O(k)$.
We construct near-optimal coresets for kernel density estimates for points in $mathbb{R}^d$ when the kernel is positive definite. Specifically we show a polynomial time construction for a coreset of size $O(sqrt{d}/varepsiloncdot sqrt{log 1/varepsilo
We present a simple heuristic algorithm for efficiently optimizing the notoriously hard minimum sum-of-squares clustering problem, usually addressed by the classical k-means heuristic and its variants. The algorithm, called recombinator-k-means, is v
$k$-means algorithm is one of the most classical clustering methods, which has been widely and successfully used in signal processing. However, due to the thin-tailed property of the Gaussian distribution, $k$-means algorithm suffers from relatively
We combine K-means clustering with the least-squares kernel classification method. K-means clustering is used to extract a set of representative vectors for each class. The least-squares kernel method uses these representative vectors as a training s
Biclustering is the task of simultaneously clustering the rows and columns of the data matrix into different subgroups such that the rows and columns within a subgroup exhibit similar patterns. In this paper, we consider the case of producing block-d