ﻻ يوجد ملخص باللغة العربية
Kernel methods are popular in clustering due to their generality and discriminating power. However, we show that many kernel clustering criteria have density biases theoretically explaining some practically significant artifacts empirically observed in the past. For example, we provide conditions and formally prove the density mode isolation bias in kernel K-means for a common class of kernels. We call it Breimans bias due to its similarity to the histogram mode isolation previously discovered by Breiman in decision tree learning with Gini impurity. We also extend our analysis to other popular kernel clustering methods, e.g. average/normalized cut or dominant sets, where density biases can take different forms. For example, splitting isolated points by cut-based criteria is essentially the sparsest subset bias, which is the opposite of the density mode bias. Our findings suggest that a principled solution for density biases in kernel clustering should directly address data inhomogeneity. We show that density equalization can be implicitly achieved using either locally adaptive weights or locally adaptive kernels. Moreover, density equalization makes many popular kernel clustering objectives equivalent. Our synthetic and real data experiments illustrate density biases and proposed solutions. We anticipate that theoretical understanding of kernel clustering limitations and their principled solutions will be important for a broad spectrum of data analysis applications across the disciplines.
We introduce a density-based clustering method called skeleton clustering that can detect clusters in multivariate and even high-dimensional data with irregular shapes. To bypass the curse of dimensionality, we propose surrogate density measures that
Single-level density-based approach has long been widely acknowledged to be a conceptually and mathematically convincing clustering method. In this paper, we propose an algorithm called best-scored clustering forest that can obtain the optimal level
Driving styles have a great influence on vehicle fuel economy, active safety, and drivability. To recognize driving styles of path-tracking behaviors for different divers, a statistical pattern-recognition method is developed to deal with the uncerta
We propose a new segmentation model combining common regularization energies, e.g. Markov Random Field (MRF) potentials, and standard pairwise clustering criteria like Normalized Cut (NC), average association (AA), etc. These clustering and regulariz
We propose a deep learning approach for discovering kernels tailored to identifying clusters over sample data. Our neural network produces sample embeddings that are motivated by--and are at least as expressive as--spectral clustering. Our training o