ﻻ يوجد ملخص باللغة العربية
Most of existing clustering algorithms are proposed without considering the selection bias in data. In many real applications, however, one cannot guarantee the data is unbiased. Selection bias might bring the unexpected correlation between features and ignoring those unexpected correlations will hurt the performance of clustering algorithms. Therefore, how to remove those unexpected correlations induced by selection bias is extremely important yet largely unexplored for clustering. In this paper, we propose a novel Decorrelation regularized K-Means algorithm (DCKM) for clustering with data selection bias. Specifically, the decorrelation regularizer aims to learn the global sample weights which are capable of balancing the sample distribution, so as to remove unexpected correlations among features. Meanwhile, the learned weights are combined with k-means, which makes the reweighted k-means cluster on the inherent data distribution without unexpected correlation influence. Moreover, we derive the updating rules to effectively infer the parameters in DCKM. Extensive experiments results on real world datasets well demonstrate that our DCKM algorithm achieves significant performance gains, indicating the necessity of removing unexpected feature correlations induced by selection bias when clustering.
This paper studies clustering of data sequences using the k-medoids algorithm. All the data sequences are assumed to be generated from emph{unknown} continuous distributions, which form clusters with each cluster containing a composite set of closely
Clustering is one of the most common unsupervised learning tasks in machine learning and data mining. Clustering algorithms have been used in a plethora of applications across several scientific fields. However, there has been limited research in the
As one type of efficient unsupervised learning methods, clustering algorithms have been widely used in data mining and knowledge discovery with noticeable advantages. However, clustering algorithms based on density peak have limited clustering effect
Highly overparametrized neural networks can display curiously strong generalization performance - a phenomenon that has recently garnered a wealth of theoretical and empirical research in order to better understand it. In contrast to most previous wo
This work draws inspiration from three important sources of research on dissimilarity-based clustering and intertwines those three threads into a consistent principled functorial theory of clustering. Those three are the overlapping clustering of Jar