No Arabic abstract
In this paper, we present a novel unsupervised feature learning architecture, which consists of a multi-clustering integration module and a variant of RBM termed multi-clustering integration RBM (MIRBM). In the multi-clustering integration module, we apply three unsupervised K-means, affinity propagation and spectral clustering algorithms to obtain three different clustering partitions (CPs) without any background knowledge or label. Then, an unanimous voting strategy is used to generate a local clustering partition (LCP). The novel MIRBM model is a core feature encoding part of the proposed unsupervised feature learning architecture. The novelty of it is that the LCP as an unsupervised guidance is integrated into one step contrastive divergence (CD1) learning to guide the distribution of the hidden layer features. For the instance in the same LCP cluster, the hidden and reconstructed hidden layer features of the MIRBM model in the proposed architecture tend to constrict together in the training process. Meanwhile, each LCP center tends to disperse from each other as much as possible in the hidden and reconstructed hidden layer during training. The experiments demonstrate that the proposed unsupervised feature learning architecture has more powerful feature representation and generalization capability than the state-of-the-art graph regularized RBM (GraphRBM) for clustering tasks in the Microsoft Research Asia Multimedia (MSRA-MM)2.0 dataset.
Feature selection is a widely used dimension reduction technique to select feature subsets because of its interpretability. Many methods have been proposed and achieved good results, in which the relationships between adjacent data points are mainly concerned. But the possible associations between data pairs that are may not adjacent are always neglected. Different from previous methods, we propose a novel and very simple approach for unsupervised feature selection, named MMFS (Multi-step Markov transition probability for Feature Selection). The idea is using multi-step Markov transition probability to describe the relation between any data pair. Two ways from the positive and negative viewpoints are employed respectively to keep the data structure after feature selection. From the positive viewpoint, the maximum transition probability that can be reached in a certain number of steps is used to describe the relation between two points. Then, the features which can keep the compact data structure are selected. From the viewpoint of negative, the minimum transition probability that can be reached in a certain number of steps is used to describe the relation between two points. On the contrary, the features that least maintain the loose data structure are selected. And the two ways can also be combined. Thus three algorithms are proposed. Our main contributions are a novel feature section approach which uses multi-step transition probability to characterize the data structure, and three algorithms proposed from the positive and negative aspects for keeping data structure. The performance of our approach is compared with the state-of-the-art methods on eight real-world data sets, and the experimental results show that the proposed MMFS is effective in unsupervised feature selection.
Partial multi-label learning (PML) models the scenario where each training instance is annotated with a set of candidate labels, and only some of the labels are relevant. The PML problem is practical in real-world scenarios, as it is difficult and even impossible to obtain precisely labeled samples. Several PML solutions have been proposed to combat with the prone misled by the irrelevant labels concealed in the candidate labels, but they generally focus on the smoothness assumption in feature space or low-rank assumption in label space, while ignore the negative information between features and labels. Specifically, if two instances have largely overlapped candidate labels, irrespective of their feature similarity, their ground-truth labels should be similar; while if they are dissimilar in the feature and candidate label space, their ground-truth labels should be dissimilar with each other. To achieve a credible predictor on PML data, we propose a novel approach called PML-LFC (Partial Multi-label Learning with Label and Feature Collaboration). PML-LFC estimates the confidence values of relevant labels for each instance using the similarity from both the label and feature spaces, and trains the desired predictor with the estimated confidence values. PML-LFC achieves the predictor and the latent label matrix in a reciprocal reinforce manner by a unified model, and develops an alternative optimization procedure to optimize them. Extensive empirical study on both synthetic and real-world datasets demonstrates the superiority of PML-LFC.
Multi-view clustering is an important research topic due to its capability to utilize complementary information from multiple views. However, there are few methods to consider the negative impact caused by certain views with unclear clustering structures, resulting in poor multi-view clustering performance. To address this drawback, we propose self-supervised discriminative feature learning for deep multi-view clustering (SDMVC). Concretely, deep autoencoders are applied to learn embedded features for each view independently. To leverage the multi-view complementary information, we concatenate all views embedded features to form the global features, which can overcome the negative impact of some views unclear clustering structures. In a self-supervised manner, pseudo-labels are obtained to build a unified target distribution to perform multi-view discriminative feature learning. During this process, global discriminative information can be mined to supervise all views to learn more discriminative features, which in turn are used to update the target distribution. Besides, this unified target distribution can make SDMVC learn consistent cluster assignments, which accomplishes the clustering consistency of multiple views while preserving their features diversity. Experiments on various types of multi-view datasets show that SDMVC achieves state-of-the-art performance.
Feature selection is a core area of data mining with a recent innovation of graph-driven unsupervised feature selection for linked data. In this setting we have a dataset $mathbf{Y}$ consisting of $n$ instances each with $m$ features and a corresponding $n$ node graph (whose adjacency matrix is $mathbf{A}$) with an edge indicating that the two instances are similar. Existing efforts for unsupervised feature selection on attributed networks have explored either directly regenerating the links by solving for $f$ such that $f(mathbf{y}_i,mathbf{y}_j) approx mathbf{A}_{i,j}$ or finding community structure in $mathbf{A}$ and using the features in $mathbf{Y}$ to predict these communities. However, graph-driven unsupervised feature selection remains an understudied area with respect to exploring more complex guidance. Here we take the novel approach of first building a block model on the graph and then using the block model for feature selection. That is, we discover $mathbf{F}mathbf{M}mathbf{F}^T approx mathbf{A}$ and then find a subset of features $mathcal{S}$ that induces another graph to preserve both $mathbf{F}$ and $mathbf{M}$. We call our approach Block Model Guided Unsupervised Feature Selection (BMGUFS). Experimental results show that our method outperforms the state of the art on several real-world public datasets in finding high-quality features for clustering.
Short-term load forecasting is a critical element of power systems energy management systems. In recent years, probabilistic load forecasting (PLF) has gained increased attention for its ability to provide uncertainty information that helps to improve the reliability and economics of system operation performances. This paper proposes a two-stage probabilistic load forecasting framework by integrating point forecast as a key probabilistic forecasting feature into PLF. In the first stage, all related features are utilized to train a point forecast model and also obtain the feature importance. In the second stage the forecasting model is trained, taking into consideration point forecast features, as well as selected feature subsets. During the testing period of the forecast model, the final probabilistic load forecast results are leveraged to obtain both point forecasting and probabilistic forecasting. Numerical results obtained from ISO New England demand data demonstrate the effectiveness of the proposed approach in the hour-ahead load forecasting, which uses the gradient boosting regression for the point forecasting and quantile regression neural networks for the probabilistic forecasting.