ترغب بنشر مسار تعليمي؟ اضغط هنا

A multivariate adaptive stochastic search method for dimensionality reduction in classification

224   0   0.0 ( 0 )
 نشر من قبل Tian Siva Tian
 تاريخ النشر 2010
  مجال البحث الاحصاء الرياضي
والبحث باللغة English




اسأل ChatGPT حول البحث

High-dimensional classification has become an increasingly important problem. In this paper we propose a Multivariate Adaptive Stochastic Search (MASS) approach which first reduces the dimension of the data space and then applies a standard classification method to the reduced space. One key advantage of MASS is that it automatically adjusts to mimic variable selection type methods, such as the Lasso, variable combination methods, such as PCA, or methods that combine these two approaches. The adaptivity of MASS allows it to perform well in situations where pure variable selection or variable combination methods fail. Another major advantage of our approach is that MASS can accurately project the data into very low-dimensional non-linear, as well as linear, spaces. MASS uses a stochastic search algorithm to select a handful of optimal projection directions from a large number of random directions in each iteration. We provide some theoretical justification for MASS and demonstrate its strengths on an extensive range of simulation studies and real world data sets by comparing it to many classical and modern classification methods.



قيم البحث

اقرأ أيضاً

172 - Julia Fukuyama 2017
When working with large biological data sets, exploratory analysis is an important first step for understanding the latent structure and for generating hypotheses to be tested in subsequent analyses. However, when the number of variables is large com pared to the number of samples, standard methods such as principal components analysis give results which are unstable and difficult to interpret. To mitigate these problems, we have developed a method which allows the analyst to incorporate side information about the relationships between the variables in a way that encourages similar variables to have similar loadings on the principal axes. This leads to a low-dimensional representation of the samples which both describes the latent structure and which has axes which are interpretable in terms of groups of closely related variables. The method is derived by putting a prior encoding the relationships between the variables on the data and following through the analysis on the posterior distributions of the samples. We show that our method does well at reconstructing true latent structure in simulated data and we also demonstrate the method on a dataset investigating the effects of antibiotics on the composition of bacteria in the human gut.
In this paper, we develop a local rank correlation measure which quantifies the performance of dimension reduction methods. The local rank correlation is easily interpretable, and robust against the extreme skewness of nearest neighbor distributions in high dimensions. Some benchmark datasets are studied. We find that the local rank correlation closely corresponds to our visual interpretation of the quality of the output. In addition, we demonstrate that the local rank correlation is useful in estimating the intrinsic dimensionality of the original data, and in selecting a suitable value of tuning parameters used in some algorithms.
In this work, we present a quantum neighborhood preserving embedding and a quantum local discriminant embedding for dimensionality reduction and classification. We demonstrate that these two algorithms have an exponential speedup over their respectiv ely classical counterparts. Along the way, we propose a variational quantum generalized eigenvalue solver that finds the generalized eigenvalues and eigenstates of a matrix pencil $(mathcal{G},mathcal{S})$. As a proof-of-principle, we implement our algorithm to solve $2^5times2^5$ generalized eigenvalue problems. Finally, our results offer two optional outputs with quantum or classical form, which can be directly applied in another quantum or classical machine learning process.
Spectral dimensionality reduction methods enable linear separations of complex data with high-dimensional features in a reduced space. However, these methods do not always give the desired results due to irregularities or uncertainties of the data. T hus, we consider aggressively modifying the scales of the features to obtain the desired classification. Using prior knowledge on the labels of partial samples to specify the Fiedler vector, we formulate an eigenvalue problem of a linear matrix pencil whose eigenvector has the feature scaling factors. The resulting factors can modify the features of entire samples to form clusters in the reduced space, according to the known labels. In this study, we propose new dimensionality reduction methods supervised using the feature scaling associated with the spectral clustering. Numerical experiments show that the proposed methods outperform well-established supervised methods for toy problems with more samples than features, and are more robust regarding clustering than existing methods. Also, the proposed methods outperform existing methods regarding classification for real-world problems with more features than samples of gene expression profiles of cancer diseases. Furthermore, the feature scaling tends to improve the clustering and classification accuracies of existing unsupervised methods, as the proportion of training data increases.
Forecasting accuracy of mortality data is important for the management of pension funds and pricing of life insurance in actuarial science. Age-specific mortality forecasting in the US poses a challenging problem in high dimensional time series analy sis. Prior attempts utilize traditional dimension reduction techniques to avoid the curse of dimensionality, and then mortality forecasting is achieved through features forecasting. However, a method of reducing dimension pertinent to ideal forecasting is elusive. To address this, we propose a novel approach to pursue features that are not only capable of representing original data well but also capturing time-serial dependence as most as possible. The proposed method is adaptive for the US mortality data and enjoys good statistical performance. As a comparison, our method performs better than existing approaches, especially in regard to the Lee-Carter Model as a benchmark in mortality analysis. Based on forecasting results, we generate more accurate estimates of future life expectancies and prices of life annuities, which can have great financial impact on life insurers and social securities compared with using Lee-Carter Model. Furthermore, various simulations illustrate scenarios under which our method has advantages, as well as interpretation of the good performance on mortality data.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا