ﻻ يوجد ملخص باللغة العربية
Hierarchical and k-medoids clustering are deterministic clustering algorithms based on pairwise distances. Using these same pairwise distances, we propose a novel stochastic clustering method based on random partition distributions. We call our method CaviarPD, for cluster analysis via random partition distributions. CaviarPD first samples clusterings from a random partition distribution and then finds the best cluster estimate based on these samples using algorithms to minimize an expected loss. We compare CaviarPD with hierarchical and k-medoids clustering through eight case studies. Cluster estimates based on our method are competitive with those of hierarchical and k-medoids clustering. They also do not require the subjective choice of the linkage method necessary for hierarchical clustering. Furthermore, our distribution-based procedure provides an intuitive graphical representation to assess clustering uncertainty.
Traditional Bayesian random partition models assume that the size of each cluster grows linearly with the number of data points. While this is appealing for some applications, this assumption is not appropriate for other tasks such as entity resoluti
We introduce a new approximate multiresolution analysis (MRA) using a single Gaussian as the scaling function, which we call Gaussian MRA (GMRA). As an initial application, we employ this new tool to accurately and efficiently compute the probability
Traditional principal component analysis (PCA) is well known in high-dimensional data analysis, but it requires to express data by a matrix with observations to be continuous. To overcome the limitations, a new method called flexible PCA (FPCA) for e
Understanding the heterogeneity over spatial locations is an important problem that has been widely studied in many applications such as economics and environmental science. In this paper, we focus on regression models for spatial panel data analysis
Proteins constitute a large group of macromolecules with a multitude of functions for all living organisms. Proteins achieve this by adopting distinct three-dimensional structures encoded by the sequence of their constituent amino acids in one or mor