Partitioned $K$-nearest neighbor local depth for scalable comparison-based learning


Abstract in English

A triplet comparison oracle on a set $S$ takes an object $x in S$ and for any pair ${y, z} subset S setminus {x}$ declares which of $y$ and $z$ is more similar to $x$. Partitioned Local Depth (PaLD) supplies a principled non-parametric partitioning of $S$ under such triplet comparisons but needs $O(n^2 log{n})$ oracle calls and $O(n^3)$ post-processing steps. We introduce Partitioned Nearest Neighbors Local Depth (PaNNLD), a computationally tractable variant of PaLD leveraging the $K$-nearest neighbors digraph on $S$. PaNNLD needs only $O(n K log{n})$ oracle calls, by replacing an oracle call by a coin flip when neither $y$ nor $z$ is adjacent to $x$ in the undirected version of the $K$-nearest neighbors digraph. By averaging over randomizations, PaNNLD subsequently requires (at best) only $O(n K^2)$ post-processing steps. Concentration of measure shows that the probability of randomization-induced error $delta$ in PaNNLD is no more than $2 e^{-delta^2 K^2}$.

Download