No Arabic abstract
We introduce a new method of performing high dimensional discriminant analysis, which we call multiDA. We achieve this by constructing a hybrid model that seamlessly integrates a multiclass diagonal discriminant analysis model and feature selection components. Our feature selection component naturally simplifies to weights which are simple functions of likelihood ratio statistics allowing natural comparisons with traditional hypothesis testing methods. We provide heuristic arguments suggesting desirable asymptotic properties of our algorithm with regards to feature selection. We compare our method with several other approaches, showing marked improvements in regard to prediction accuracy, interpretability of chosen features, and algorithm run time. We demonstrate such strengths of our model by showing strong classification performance on publicly available high dimensional datasets, as well as through multiple simulation studies. We make an R package available implementing our approach.
Optimal transport is a machine learning problem with applications including distribution comparison, feature selection, and generative adversarial networks. In this paper, we propose feature-robust optimal transport (FROT) for high-dimensional data, which solves high-dimensional OT problems using feature selection to avoid the curse of dimensionality. Specifically, we find a transport plan with discriminative features. To this end, we formulate the FROT problem as a min--max optimization problem. We then propose a convex formulation of the FROT problem and solve it using a Frank--Wolfe-based optimization algorithm, whereby the subproblem can be efficiently solved using the Sinkhorn algorithm. Since FROT finds the transport plan from selected features, it is robust to noise features. To show the effectiveness of FROT, we propose using the FROT algorithm for the layer selection problem in deep neural networks for semantic correspondence. By conducting synthetic and benchmark experiments, we demonstrate that the proposed method can find a strong correspondence by determining important layers. We show that the FROT algorithm achieves state-of-the-art performance in real-world semantic correspondence datasets.
Feature selection is an important tool to deal with high dimensional data. In unsupervised case, many popular algorithms aim at maintaining the structure of the original data. In this paper, we propose a simple and effective feature selection algorithm to enhance sample similarity preservation through a new perspective, topology preservation, which is represented by persistent diagrams from the context of computational topology. This method is designed upon a unified feature selection framework called IVFS, which is inspired by random subset method. The scheme is flexible and can handle cases where the problem is analytically intractable. The proposed algorithm is able to well preserve the pairwise distances, as well as topological patterns, of the full data. We demonstrate that our algorithm can provide satisfactory performance under a sharp sub-sampling rate, which supports efficient implementation of our proposed method to large scale datasets. Extensive experiments validate the effectiveness of the proposed feature selection scheme.
Missing data are a concern in many real world data sets and imputation methods are often needed to estimate the values of missing data, but data sets with excessive missingness and high dimensionality challenge most approaches to imputation. Here we show that appropriate feature selection can be an effective preprocessing step for imputation, allowing for more accurate imputation and subsequent model predictions. The key feature of this preprocessing is that it incorporates uncertainty: by accounting for uncertainty due to missingness when selecting features we can reduce the degree of missingness while also limiting the number of uninformative features being used to make predictive models. We introduce a method to perform uncertainty-aware feature selection (UAFS), provide a theoretical motivation, and test UAFS on both real and synthetic problems, demonstrating that across a variety of data sets and levels of missingness we can improve the accuracy of imputations. Improved imputation due to UAFS also results in improved prediction accuracy when performing supervised learning using these imputed data sets. Our UAFS method is general and can be fruitfully coupled with a variety of imputation methods.
Gaussian graphical models are widely utilized to infer and visualize networks of dependencies between continuous variables. However, inferring the graph is difficult when the sample size is small compared to the number of variables. To reduce the number of parameters to estimate in the model, we propose a non-asymptotic model selection procedure supported by strong theoretical guarantees based on an oracle inequality and a minimax lower bound. The covariance matrix of the model is approximated by a block-diagonal matrix. The structure of this matrix is detected by thresholding the sample covariance matrix, where the threshold is selected using the slope heuristic. Based on the block-diagonal structure of the covariance matrix, the estimation problem is divided into several independent problems: subsequently, the network of dependencies between variables is inferred using the graphical lasso algorithm in each block. The performance of the procedure is illustrated on simulated data. An application to a real gene expression dataset with a limited sample size is also presented: the dimension reduction allows attention to be objectively focused on interactions among smaller subsets of genes, leading to a more parsimonious and interpretable modular network.
In many artificial intelligence and computer vision systems, the same object can be observed at distinct viewpoints or by diverse sensors, which raises the challenges for recognizing objects from different, even heterogeneous views. Multi-view discriminant analysis (MvDA) is an effective multi-view subspace learning method, which finds a discriminant common subspace by jointly learning multiple view-specific linear projections for object recognition from multiple views, in a non-pairwise way. In this paper, we propose the kernel version of multi-view discriminant analysis, called kernel multi-view discriminant analysis (KMvDA). To overcome the well-known computational bottleneck of kernel methods, we also study the performance of using random Fourier features (RFF) to approximate Gaussian kernels in KMvDA, for large scale learning. Theoretical analysis on stability of this approximation is developed. We also conduct experiments on several popular multi-view datasets to illustrate the effectiveness of our proposed strategy.