New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Duality between Feature Selection and Data Clustering

149 0 0.0 ( 0 )

Download Cite

Added by Chung Chan

Publication date 2016

fields Informatics Engineering

and research's language is English

Authors Chung Chan - Ali Al-Bashabsheh - Qiaoqiao Zhou

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

The feature-selection problem is formulated from an information-theoretic perspective. We show that the problem can be efficiently solved by an extension of the recently proposed info-clustering paradigm. This reveals the fundamental duality between feature selection and data clustering,which is a consequence of the more general duality between the principal partition and the principal lattice of partitions in combinatorial optimization.

rate research

A Rigorous Information-Theoretic Definition of Redundancy and Relevancy in Feature Selection Based on (Partial) Information Decomposition

74 - Patricia Wollstadt , Sebastian Schmitt , Michael Wibral 2021

Selecting a minimal feature set that is maximally informative about a target variable is a central task in machine learning and statistics. Information theory provides a powerful framework for formulating feature selection algorithms -- yet, a rigorous, information-theoretic definition of feature relevancy, which accounts for feature interactions such as redundant and synergistic contributions, is still missing. We argue that this lack is inherent to classical information theory which does not provide measures to decompose the information a set of variables provides about a target into unique, redundant, and synergistic contributions. Such a decomposition has been introduced only recently by the partial information decomposition (PID) framework. Using PID, we clarify why feature selection is a conceptually difficult problem when approached using information theory and provide a novel definition of feature relevancy and redundancy in PID terms. From this definition, we show that the conditional mutual information (CMI) maximizes relevancy while minimizing redundancy and propose an iterative, CMI-based algorithm for practical feature selection. We demonstrate the power of our CMI-based algorithm in comparison to the unconditional mutual information on benchmark examples and provide corresponding PID estimates to highlight how PID allows to quantify information contribution of features and their interactions in feature-selection problems.

Information Theory Machine Learning Information Theory

Clustering and Feature Selection using Sparse Principal Component Analysis

172 - Ronny Luss , Alexandre dAspremont 2008

In this paper, we study the application of sparse principal component analysis (PCA) to clustering and feature selection problems. Sparse PCA seeks sparse factors, or linear combinations of the data variables, explaining a maximum amount of variance in the data while having only a limited number of nonzero coefficients. PCA is often used as a simple clustering technique and sparse factors allow us here to interpret the clusters in terms of a reduced set of variables. We begin with a brief introduction and motivation on sparse PCA and detail our implementation of the algorithm in dAspremont et al. (2005). We then apply these results to some classic clustering and feature selection problems arising in biology.

Artificial Intelligence Machine Learning Mathematical Software

Parameterized Complexity of Feature Selection for Categorical Data Clustering

92 - Sayan Bandyapadhyay , Fedor V. Fomin , Petr A. Golovach 2021

We develop new algorithmic methods with provable guarantees for feature selection in regard to categorical data clustering. While feature selection is one of the most common approaches to reduce dimensionality in practice, most of the known feature selection methods are heuristics. We study the following mathematical model. We assume that there are some inadvertent (or undesirable) features of the input data that unnecessarily increase the cost of clustering. Consequently, we want to select a subset of the original features from the data such that there is a small-cost clustering on the selected features. More precisely, for given integers $ell$ (the number of irrelevant features) and $k$ (the number of clusters), budget $B$, and a set of $n$ categorical data points (represented by $m$-dimensional vectors whose elements belong to a finite set of values $Sigma$), we want to select $m-ell$ relevant features such that the cost of any optimal $k$-clustering on these features does not exceed $B$. Here the cost of a cluster is the sum of Hamming distances ($ell_0$-distances) between the selected features of the elements of the cluster and its center. The clustering cost is the total sum of the costs of the clusters. We use the framework of parameterized complexity to identify how the complexity of the problem depends on parameters $k$, $B$, and $|Sigma|$. Our main result is an algorithm that solves the Feature Selection problem in time $f(k,B,|Sigma|)cdot m^{g(k,|Sigma|)}cdot n^2$ for some functions $f$ and $g$. In other words, the problem is fixed-parameter tractable parameterized by $B$ when $|Sigma|$ and $k$ are constants. Our algorithm is based on a solution to a more general problem, Constrained Clustering with Outliers. We also complement our algorithmic findings with complexity lower bounds.

Data Structures and Algorithms Discrete Mathematics

Agglomerative Info-Clustering

147 - Chung Chan , Ali Al-Bashabsheh , Qiaoqiao Zhou 2017

An agglomerative clustering of random variables is proposed, where clusters of random variables sharing the maximum amount of multivariate mutual information are merged successively to form larger clusters. Compared to the previous info-clustering algorithms, the agglomerative approach allows the computation to stop earlier when clusters of desired size and accuracy are obtained. An efficient algorithm is also derived based on the submodularity of entropy and the duality between the principal sequence of partitions and the principal sequence for submodular functions.

Information Theory Machine Learning Information Theory

Federated mmWave Beam Selection Utilizing LIDAR Data

100 - Mahdi Boloursaz Mashhadi , Mikolaj Jankowski , Tze-Yang Tung 2021

Efficient link configuration in millimeter wave (mmWave) communication systems is a crucial yet challenging task due to the overhead imposed by beam selection. For vehicle-to-infrastructure (V2I) networks, side information from LIDAR sensors mounted on the vehicles has been leveraged to reduce the beam search overhead. In this letter, we propose a federated LIDAR aided beam selection method for V2I mmWave communication systems. In the proposed scheme, connected vehicles collaborate to train a shared neural network (NN) on their locally available LIDAR data during normal operation of the system. We also propose a reduced-complexity convolutional NN (CNN) classifier architecture and LIDAR preprocessing, which significantly outperforms previous works in terms of both the performance and the complexity.

Information Theory Signal Processing Information Theory

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Duality between Feature Selection and Data Clustering

Ask ChatGPT about the research

No Arabic abstract

Read More

suggested questions