ترغب بنشر مسار تعليمي؟ اضغط هنا

Robust Independent Component Analysis via Minimum Divergence Estimation

408   0   0.0 ( 0 )
 نشر من قبل Hung Hung
 تاريخ النشر 2012
  مجال البحث الاحصاء الرياضي
والبحث باللغة English




اسأل ChatGPT حول البحث

Independent component analysis (ICA) has been shown to be useful in many applications. However, most ICA methods are sensitive to data contamination and outliers. In this article we introduce a general minimum U-divergence framework for ICA, which covers some standard ICA methods as special cases. Within the U-family we further focus on the gamma-divergence due to its desirable property of super robustness, which gives the proposed method gamma-ICA. Statistical properties and technical conditions for the consistency of gamma-ICA are rigorously studied. In the limiting case, it leads to a necessary and sufficient condition for the consistency of MLE-ICA. This necessary and sufficient condition is weaker than the condition known in the literature. Since the parameter of interest in ICA is an orthogonal matrix, a geometrical algorithm based on gradient flows on special orthogonal group is introduced to implement gamma-ICA. Furthermore, a data-driven selection for the gamma value, which is critical to the achievement of gamma-ICA, is developed. The performance, especially the robustness, of gamma-ICA in comparison with standard ICA methods is demonstrated through experimental studies using simulated data and image data.



قيم البحث

اقرأ أيضاً

Independent component analysis (ICA) has been widely used for blind source separation in many fields such as brain imaging analysis, signal processing and telecommunication. Many statistical techniques based on M-estimates have been proposed for esti mating the mixing matrix. Recently, several nonparametric methods have been developed, but in-depth analysis of asymptotic efficiency has not been available. We analyze ICA using semiparametric theories and propose a straightforward estimate based on the efficient score function by using B-spline approximations. The estimate is asymptotically efficient under moderate conditions and exhibits better performance than standard ICA methods in a variety of simulations.
Functional principal component analysis (FPCA) has been widely used to capture major modes of variation and reduce dimensions in functional data analysis. However, standard FPCA based on the sample covariance estimator does not work well in the prese nce of outliers. To address this challenge, a new robust functional principal component analysis approach based on the functional pairwise spatial sign (PASS) operator, termed PASS FPCA, is introduced where we propose estimation procedures for both eigenfunctions and eigenvalues with and without measurement error. Compared to existing robust FPCA methods, the proposed one requires weaker distributional assumptions to conserve the eigenspace of the covariance function. In particular, a class of distributions called the weakly functional coordinate symmetric (weakly FCS) is introduced that allows for severe asymmetry and is strictly larger than the functional elliptical distribution class, the latter of which has been well used in the robust statistics literature. The robustness of the PASS FPCA is demonstrated via simulation studies and analyses of accelerometry data from a large-scale epidemiological study of physical activity on older women that partly motivates this work.
Compositional data represent a specific family of multivariate data, where the information of interest is contained in the ratios between parts rather than in absolute values of single parts. The analysis of such specific data is challenging as the a pplication of standard multivariate analysis tools on the raw observations can lead to spurious results. Hence, it is appropriate to apply certain transformations prior further analysis. One popular multivariate data analysis tool is independent component analysis. Independent component analysis aims to find statistically independent components in the data and as such might be seen as an extension to principal component analysis. In this paper we examine an approach of how to apply independent component analysis on compositional data by respecting the nature of the former and demonstrate the usefulness of this procedure on a metabolomic data set.
Fast Independent Component Analysis (FastICA) is a component separation algorithm based on the levels of non-Gaussianity. Here we apply the FastICA to the component separation problem of the microwave background including carbon monoxide (CO) line em issions that are found to contaminate the PLANCK High Frequency Instrument (HFI) data. Specifically we prepare 100GHz, 143GHz, and 217GHz mock microwave sky maps including galactic thermal dust, NANTEN CO line, and the Cosmic Microwave Background (CMB) emissions, and then estimate the independent components based on the kurtosis. We find that the FastICA can successfully estimate the CO component as the first independent component in our deflection algorithm as its distribution has the largest degree of non-Gaussianity among the components. By subtracting the CO and the dust components from the original sky maps, we will be able to make an unbiased estimate of the cosmological CMB angular power spectrum.
Fan et al. [$mathit{Annals}$ $mathit{of}$ $mathit{Statistics}$ $textbf{47}$(6) (2019) 3009-3031] proposed a distributed principal component analysis (PCA) algorithm to significantly reduce the communication cost between multiple servers. In this pape r, we robustify their distributed algorithm by using robust covariance matrix estimators respectively proposed by Minsker [$mathit{Annals}$ $mathit{of}$ $mathit{Statistics}$ $textbf{46}$(6A) (2018) 2871-2903] and Ke et al. [$mathit{Statistical}$ $mathit{Science}$ $textbf{34}$(3) (2019) 454-471] instead of the sample covariance matrix. We extend the deviation bound of robust covariance estimators with bounded fourth moments to the case of the heavy-tailed distribution under only bounded $2+epsilon$ moments assumption. The theoretical results show that after the shrinkage or truncation treatment for the sample covariance matrix, the statistical error rate of the final estimator produced by the robust algorithm is the same as that of sub-Gaussian tails, when $epsilon geq 2$ and the sampling distribution is symmetric innovation. While $2 > epsilon >0$, the rate with respect to the sample size of each server is slower than that of the bounded fourth moment assumption. Extensive numerical results support the theoretical analysis, and indicate that the algorithm performs better than the original distributed algorithm and is robust to heavy-tailed data and outliers.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا