No Arabic abstract
We study degeneracies between cosmological parameters and measurement errors from cosmic shear surveys using a principal component analysis of the Fisher matrix. We simulate realistic survey topologies with non-uniform sky coverage, and quantify the effect of survey geometry, depth and noise from intrinsic galaxy ellipticities on the parameter errors. This analysis allows us to optimise the survey geometry. Using the shear two-point correlation functions and the aperture mass dispersion, we study various degeneracy directions in a multi-dimensional parameter space spanned by Omega_m, Omega_Lambda, sigma_8, the shape parameter Gamma, the spectral index n_s, along with parameters that specify the distribution of source galaxies. If only three parameters are to be obtained from weak lensing data, a single principal component is dominant and contains all information about the main parameter degeneracies and their errors. The variance of the dominant principal component of the Fisher matrix shows a minimum for survey strategies which have small cosmic variance and measure the shear correlation up to several degrees [abridged].
We show how to efficiently project a vector onto the top principal components of a matrix, without explicitly computing these components. Specifically, we introduce an iterative algorithm that provably computes the projection using few calls to any black-box routine for ridge regression. By avoiding explicit principal component analysis (PCA), our algorithm is the first with no runtime dependence on the number of top principal components. We show that it can be used to give a fast iterative method for the popular principal component regression problem, giving the first major runtime improvement over the naive method of combining PCA with regression. To achieve our results, we first observe that ridge regression can be used to obtain a smooth projection onto the top principal components. We then sharpen this approximation to true projection using a low-degree polynomial approximation to the matrix step function. Step function approximation is a topic of long-term interest in scientific computing. We extend prior theory by constructing polynomials with simple iterative structure and rigorously analyzing their behavior under limited precision.
We study the estimators of various second-order weak lensing statistics such as the shear correlation functions xi_pm and the aperture mass dispersion <M_ap^2> which can directly be constructed from weak lensing shear maps. We compare the efficiency with which these estimators can be used to constrain cosmological parameters. To this end we introduce the Karhunen-Loeve (KL) eigenmode analysis techniques for weak lensing surveys. These tools are shown to be very effective as a diagnostics for optimising survey strategies. The usefulness of these tools to study the effect of angular binning, the depth and width of the survey and noise contributions due to intrinsic ellipticities and number density of source galaxies on the estimation of cosmological parameters is demonstrated. Results from independent analysis of various parameters and joint estimations are compared. We also study how degeneracies among various cosmological and survey parameters affect the eigenmodes associated with these parameters.
Principal component analysis (PCA) is an important tool in exploring data. The conventional approach to PCA leads to a solution which favours the structures with large variances. This is sensitive to outliers and could obfuscate interesting underlying structures. One of the equivalent definitions of PCA is that it seeks the subspaces that maximize the sum of squared pairwise distances between data projections. This definition opens up more flexibility in the analysis of principal components which is useful in enhancing PCA. In this paper we introduce scales into PCA by maximizing only the sum of pairwise distances between projections for pairs of datapoints with distances within a chosen interval of values [l,u]. The resulting principal component decompositions in Multiscale PCA depend on point (l,u) on the plane and for each point we define projectors onto principal components. Cluster analysis of these projectors reveals the structures in the data at various scales. Each structure is described by the eigenvectors at the medoid point of the cluster which represent the structure. We also use the distortion of projections as a criterion for choosing an appropriate scale especially for data with outliers. This method was tested on both artificial distribution of data and real data. For data with multiscale structures, the method was able to reveal the different structures of the data and also to reduce the effect of outliers in the principal component analysis.
Principal Component Analysis (PCA) is one of the most important methods to handle high dimensional data. However, most of the studies on PCA aim to minimize the loss after projection, which usually measures the Euclidean distance, though in some fields, angle distance is known to be more important and critical for analysis. In this paper, we propose a method by adding constraints on factors to unify the Euclidean distance and angle distance. However, due to the nonconvexity of the objective and constraints, the optimized solution is not easy to obtain. We propose an alternating linearized minimization method to solve it with provable convergence rate and guarantee. Experiments on synthetic data and real-world datasets have validated the effectiveness of our method and demonstrated its advantages over state-of-art clustering methods.
We consider the problem of principal component analysis from a data matrix where the entries of each column have undergone some unknown permutation, termed Unlabeled Principal Component Analysis (UPCA). Using algebraic geometry, we establish that for generic enough data, and up to a permutation of the coordinates of the ambient space, there is a unique subspace of minimal dimension that explains the data. We show that a permutation-invariant system of polynomial equations has finitely many solutions, with each solution corresponding to a row permutation of the ground-truth data matrix. Allowing for missing entries on top of permutations leads to the problem of unlabeled matrix completion, for which we give theoretical results of similar flavor. We also propose a two-stage algorithmic pipeline for UPCA suitable for the practically relevant case where only a fraction of the data has been permuted. Stage-I of this pipeline employs robust-PCA methods to estimate the ground-truth column-space. Equipped with the column-space, stage-II applies methods for linear regression without correspondences to restore the permuted data. A computational study reveals encouraging findings, including the ability of UPCA to handle face images from the Extended Yale-B database with arbitrarily permuted patches of arbitrary size in $0.3$ seconds on a standard desktop computer.