ﻻ يوجد ملخص باللغة العربية
Feature selection is an important and challenging task in high dimensional clustering. For example, in genomics, there may only be a small number of genes that are differentially expressed, which are informative to the overall clustering structure. Existing feature selection methods, such as Sparse K-means, rarely tackle the problem of accounting features that can only separate a subset of clusters. In genomics, it is highly likely that a gene can only define one subtype against all the other subtypes or distinguish a pair of subtypes but not others. In this paper, we propose a K-means based clustering algorithm that discovers informative features as well as which cluster pairs are separable by each selected features. The method is essentially an EM algorithm, in which we introduce lasso-type constraints on each cluster pair in the M step, and make the E step possible by maximizing the raw cross-cluster distance instead of minimizing the intra-cluster distance. The results were demonstrated on simulated data and a leukemia gene expression dataset.
This paper describes a distributed MapReduce implementation of the minimum Redundancy Maximum Relevance algorithm, a popular feature selection method in bioinformatics and network inference problems. The proposed approach handles both tall/narrow and
Penalized (or regularized) regression, as represented by Lasso and its variants, has become a standard technique for analyzing high-dimensional data when the number of variables substantially exceeds the sample size. The performance of penalized regr
In this paper, we study the application of sparse principal component analysis (PCA) to clustering and feature selection problems. Sparse PCA seeks sparse factors, or linear combinations of the data variables, explaining a maximum amount of variance
Marginal maximum likelihood (MML) estimation is the preferred approach to fitting item response theory models in psychometrics due to the MML estimators consistency, normality, and efficiency as the sample size tends to infinity. However, state-of-th
High-dimensional feature selection is a central problem in a variety of application domains such as machine learning, image analysis, and genomics. In this paper, we propose graph-based tests as a useful basis for feature selection. We describe an al