Cluster Analysis via Random Partition Distributions


Abstract in English

Hierarchical and k-medoids clustering are deterministic clustering algorithms based on pairwise distances. Using these same pairwise distances, we propose a novel stochastic clustering method based on random partition distributions. We call our method CaviarPD, for cluster analysis via random partition distributions. CaviarPD first samples clusterings from a random partition distribution and then finds the best cluster estimate based on these samples using algorithms to minimize an expected loss. We compare CaviarPD with hierarchical and k-medoids clustering through eight case studies. Cluster estimates based on our method are competitive with those of hierarchical and k-medoids clustering. They also do not require the subjective choice of the linkage method necessary for hierarchical clustering. Furthermore, our distribution-based procedure provides an intuitive graphical representation to assess clustering uncertainty.

Download