ﻻ يوجد ملخص باللغة العربية
k-means is a widely used clustering algorithm, but for $k$ clusters and a dataset size of $N$, each iteration of Lloyds algorithm costs $O(kN)$ time. Although there are existing techniques to accelerate single Lloyd iterations, none of these are tailored to the case of large $k$, which is increasingly common as dataset sizes grow. We propose a dual-tree algorithm that gives the exact same results as standard $k$-means; when using cover trees, we use adaptive analysis techniques to, under some assumptions, bound the single-iteration runtime of the algorithm as $O(N + k log k)$. To our knowledge these are the first sub-$O(kN)$ bounds for exact Lloyd iterations. We then show that this theoretically favorable algorithm performs competitively in practice, especially for large $N$ and $k$ in low dimensions. Further, the algorithm is tree-independent, so any type of tree may be used.
By a well known result the treewidth of k-outerplanar graphs is at most 3k-1. This paper gives, besides a rigorous proof of this fact, an algorithmic implementation of the proof, i.e. it is shown that, given a k-outerplanar graph G, a tree decomposit
Dual-tree algorithms are a widely used class of branch-and-bound algorithms. Unfortunately, developing dual-tree algorithms for use with different trees and problems is often complex and burdensome. We introduce a four-part logical split: the tree, t
We show how to approximate a data matrix $mathbf{A}$ with a much smaller sketch $mathbf{tilde A}$ that can be used to solve a general class of constrained k-rank approximation problems to within $(1+epsilon)$ error. Importantly, this class of problem
This paper considers $k$-means clustering in the presence of noise. It is known that $k$-means clustering is highly sensitive to noise, and thus noise should be removed to obtain a quality solution. A popular formulation of this problem is called $k$
For a clustered graph, i.e, a graph whose vertex set is recursively partitioned into clusters, the C-Planarity Testing problem asks whether it is possible to find a planar embedding of the graph and a representation of each cluster as a region homeom