Principal component analysis is a statistical method, which lowers the number of important variables in a data set. The use of this method for the bursts spectra and afterglows is discussed in this paper. The analysis indicates that three principal components are enough among the eight ones to describe the variablity of the data. The correlation between spectral index alpha and the redshift suggests that the thermal emission component becomes more dominant at larger redshifts.
We show how to efficiently project a vector onto the top principal components of a matrix, without explicitly computing these components. Specifically, we introduce an iterative algorithm that provably computes the projection using few calls to any black-box routine for ridge regression. By avoiding explicit principal component analysis (PCA), our algorithm is the first with no runtime dependence on the number of top principal components. We show that it can be used to give a fast iterative method for the popular principal component regression problem, giving the first major runtime improvement over the naive method of combining PCA with regression. To achieve our results, we first observe that ridge regression can be used to obtain a smooth projection onto the top principal components. We then sharpen this approximation to true projection using a low-degree polynomial approximation to the matrix step function. Step function approximation is a topic of long-term interest in scientific computing. We extend prior theory by constructing polynomials with simple iterative structure and rigorously analyzing their behavior under limited precision.
Principal component analysis (PCA) is an important tool in exploring data. The conventional approach to PCA leads to a solution which favours the structures with large variances. This is sensitive to outliers and could obfuscate interesting underlying structures. One of the equivalent definitions of PCA is that it seeks the subspaces that maximize the sum of squared pairwise distances between data projections. This definition opens up more flexibility in the analysis of principal components which is useful in enhancing PCA. In this paper we introduce scales into PCA by maximizing only the sum of pairwise distances between projections for pairs of datapoints with distances within a chosen interval of values [l,u]. The resulting principal component decompositions in Multiscale PCA depend on point (l,u) on the plane and for each point we define projectors onto principal components. Cluster analysis of these projectors reveals the structures in the data at various scales. Each structure is described by the eigenvectors at the medoid point of the cluster which represent the structure. We also use the distortion of projections as a criterion for choosing an appropriate scale especially for data with outliers. This method was tested on both artificial distribution of data and real data. For data with multiscale structures, the method was able to reveal the different structures of the data and also to reduce the effect of outliers in the principal component analysis.
Principal Component Analysis (PCA) is one of the most important methods to handle high dimensional data. However, most of the studies on PCA aim to minimize the loss after projection, which usually measures the Euclidean distance, though in some fields, angle distance is known to be more important and critical for analysis. In this paper, we propose a method by adding constraints on factors to unify the Euclidean distance and angle distance. However, due to the nonconvexity of the objective and constraints, the optimized solution is not easy to obtain. We propose an alternating linearized minimization method to solve it with provable convergence rate and guarantee. Experiments on synthetic data and real-world datasets have validated the effectiveness of our method and demonstrated its advantages over state-of-art clustering methods.
We consider the problem of principal component analysis from a data matrix where the entries of each column have undergone some unknown permutation, termed Unlabeled Principal Component Analysis (UPCA). Using algebraic geometry, we establish that for generic enough data, and up to a permutation of the coordinates of the ambient space, there is a unique subspace of minimal dimension that explains the data. We show that a permutation-invariant system of polynomial equations has finitely many solutions, with each solution corresponding to a row permutation of the ground-truth data matrix. Allowing for missing entries on top of permutations leads to the problem of unlabeled matrix completion, for which we give theoretical results of similar flavor. We also propose a two-stage algorithmic pipeline for UPCA suitable for the practically relevant case where only a fraction of the data has been permuted. Stage-I of this pipeline employs robust-PCA methods to estimate the ground-truth column-space. Equipped with the column-space, stage-II applies methods for linear regression without correspondences to restore the permuted data. A computational study reveals encouraging findings, including the ability of UPCA to handle face images from the Extended Yale-B database with arbitrarily permuted patches of arbitrary size in $0.3$ seconds on a standard desktop computer.
From a Principal Component Analysis (PCA) of 78 z~3 high quality quasar spectra in the SDSS-DR7, we derive the principal components characterizing the QSO continuum over the full wavelength range available. The shape of the mean continuum, is similar to that measured at low-z (z~1), but the equivalent width of the emission lines are larger at low redshift. We calculate the correlation between fluxes at different wavelengths and find that the emission line fluxes in the red part of the spectrum are correlated with that in the blue part. We construct a projection matrix to predict the continuum in the Lyman-$alpha$ forest from the red part of the spectrum. We apply this matrix to quasars in the SDSS-DR7 to derive the evolution with redshift of the mean flux in the Lyman-$alpha$ forest due to the absorption by the intergalactic neutral hydrogen. A change in the evolution of the mean flux is apparent around z~3 in the sense of a steeper decrease of the mean flux at higher redshifts. The same evolution is found when the continuum is estimated from the extrapolation of a power-law continuum fitted in the red part of the quasar spectrum if a correction, derived from simple simulations, is applied. Our findings are consistent with previous determinations using high spectral resolution data. We provide the PCA eigenvectors over the wavelength range 1020-2000 AA and the distribution of their weights that can be used to simulate QSO mock spectra.