Robust Principal Component Analysis Using Statistical Estimators

652 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Peratham Wiriyathammabhum Mr.

تاريخ النشر 2012

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Peratham Wiriyathammabhum - Boonserm Kijsirikul

الذكاء الاصطناعي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Principal Component Analysis (PCA) finds a linear mapping and maximizes the variance of the data which makes PCA sensitive to outliers and may cause wrong eigendirection. In this paper, we propose techniques to solve this problem; we use the data-centering method and reestimate the covariance matrix using robust statistic techniques such as median, robust scaling which is a booster to data-centering and Huber M-estimator which measures the presentation of outliers and reweight them with small values. The results on several real world data sets show that our proposed method handles outliers and gains better results than the original PCA and provides the same accuracy with lower computation cost than the Kernel PCA using the polynomial kernel in classification tasks.

قيم البحث

433 - Ronny Luss , Alexandre dAspremont 2008

In this paper, we study the application of sparse principal component analysis (PCA) to clustering and feature selection problems. Sparse PCA seeks sparse factors, or linear combinations of the data variables, explaining a maximum amount of variance in the data while having only a limited number of nonzero coefficients. PCA is often used as a simple clustering technique and sparse factors allow us here to interpret the clusters in terms of a reduced set of variables. We begin with a brief introduction and motivation on sparse PCA and detail our implementation of the algorithm in dAspremont et al. (2005). We then apply these results to some classic clustering and feature selection problems arising in biology.

الذكاء الاصطناعي التعلم الآلي البرمجيات الرياضية

On Robust Probabilistic Principal Component Analysis using Multivariate $t$-Distributions

214 - Yiping Guo , Howard D. Bondell 2020

Principal Component Analysis (PCA) is a common multivariate statistical analysis method, and Probabilistic Principal Component Analysis (PPCA) is its probabilistic reformulation under the framework of Gaussian latent variable model. To improve the ro bustness of PPCA, it has been proposed to change the underlying Gaussian distributions to multivariate $t$-distributions. Based on the representation of $t$-distribution as a scale mixture of Gaussians, a hierarchical model is used for implementation. However, although the robust PPCA methods work reasonably well for some simulation studies and real data, the hierarchical model implemented does not yield the equivalent interpretation. In this paper, we present a set of equivalent relationships between those models, and discuss the performance of robust PPCA methods using different multivariate $t$-distributed structures through several simulation studies. In doing so, we clarify a current misrepresentation in the literature, and make connections between a set of hierarchical models for robust PPCA.

المنهجية التعلم الالي

High Dimensional Process Monitoring Using Robust Sparse Probabilistic Principal Component Analysis

99 - Mohammad Nabhan , Yajun Mei , Jianjun Shi 2019

High dimensional data has introduced challenges that are difficult to address when attempting to implement classical approaches of statistical process control. This has made it a topic of interest for research due in recent years. However, in many ca ses, data sets have underlying structures, such as in advanced manufacturing systems. If extracted correctly, efficient methods for process control can be developed. This paper proposes a robust sparse dimensionality reduction approach for correlated high-dimensional process monitoring to address the aforementioned issues. The developed monitoring technique uses robust sparse probabilistic PCA to reduce the dimensionality of the data stream while retaining interpretability. The proposed methodology utilizes Bayesian variational inference to obtain the estimates of a probabilistic representation of PCA. Simulation studies were conducted to verify the efficacy of the proposed methodology. Furthermore, we conducted a case study for change detection for in-line Raman spectroscopy to validate the efficiency of our proposed method in a practical scenario.

تطبيقات الإحصاء المنهجية

Reionization constraints using Principal Component Analysis

399 - Sourav Mitra , T. Roy Choudhury , Andrea Ferrara 2010

Using a semi-analytical model developed by Choudhury & Ferrara (2005) we study the observational constraints on reionization via a principal component analysis (PCA). Assuming that reionization at z>6 is primarily driven by stellar sources, we decomp ose the unknown function N_{ion}(z), representing the number of photons in the IGM per baryon in collapsed objects, into its principal components and constrain the latter using the photoionization rate obtained from Ly-alpha forest Gunn-Peterson optical depth, the WMAP7 electron scattering optical depth and the redshift distribution of Lyman-limit systems at z sim 3.5. The main findings of our analysis are: (i) It is sufficient to model N_{ion}(z) over the redshift range 2<z<14 using 5 parameters to extract the maximum information contained within the data. (ii) All quantities related to reionization can be severely constrained for z<6 because of a large number of data points whereas constraints at z>6 are relatively loose. (iii) The weak constraints on N_{ion}(z) at z>6 do not allow to disentangle different feedback models with present data. There is a clear indication that N_{ion}(z) must increase at z>6, thus ruling out reionization by a single stellar population with non-evolving IMF, and/or star-forming efficiency, and/or photon escape fraction. The data allows for non-monotonic N_{ion}(z) which may contain sharp features around z sim 7. (iv) The PCA implies that reionization must be 99% completed between 5.8<z<10.3 (95% confidence level) and is expected to be 50% complete at z approx 9.5-12. With future data sets, like those obtained by Planck, the z>6 constraints will be significantly improved.

علم الكونيات والفيزياء الفلكية Nongalactic

Submodular Load Clustering with Robust Principal Component Analysis

274 - Yishen Wang , Xiao Lu , Yiran Xu 2019

Traditional load analysis is facing challenges with the new electricity usage patterns due to demand response as well as increasing deployment of distributed generations, including photovoltaics (PV), electric vehicles (EV), and energy storage system s (ESS). At the transmission system, despite of irregular load behaviors at different areas, highly aggregated load shapes still share similar characteristics. Load clustering is to discover such intrinsic patterns and provide useful information to other load applications, such as load forecasting and load modeling. This paper proposes an efficient submodular load clustering method for transmission-level load areas. Robust principal component analysis (R-PCA) firstly decomposes the annual load profiles into low-rank components and sparse components to extract key features. A novel submodular cluster center selection technique is then applied to determine the optimal cluster centers through constructed similarity graph. Following the selection results, load areas are efficiently assigned to different clusters for further load analysis and applications. Numerical results obtained from PJM load demonstrate the effectiveness of the proposed approach.

التعلم الآلي التعلم الالي