Do you want to publish a course? Click here

Visualizing probabilistic models: Intensive Principal Component Analysis

185   0   0.0 ( 0 )
 Added by Katherine Quinn
 Publication date 2018
  fields Physics
and research's language is English




Ask ChatGPT about the research

Unsupervised learning makes manifest the underlying structure of data without curated training and specific problem definitions. However, the inference of relationships between data points is frustrated by the `curse of dimensionality in high-dimensions. Inspired by replica theory from statistical mechanics, we consider replicas of the system to tune the dimensionality and take the limit as the number of replicas goes to zero. The result is the intensive embedding, which is not only isometric (preserving local distances) but allows global structure to be more transparently visualized. We develop the Intensive Principal Component Analysis (InPCA) and demonstrate clear improvements in visualizations of the Ising model of magnetic spins, a neural network, and the dark energy cold dark matter ({Lambda}CDM) model as applied to the Cosmic Microwave Background.



rate research

Read More

381 - Michael E. Wall 2002
This chapter describes gene expression analysis by Singular Value Decomposition (SVD), emphasizing initial characterization of the data. We describe SVD methods for visualization of gene expression data, representation of the data using a smaller number of variables, and detection of patterns in noisy gene expression data. In addition, we describe the precise relation between SVD analysis and Principal Component Analysis (PCA) when PCA is calculated using the covariance matrix, enabling our descriptions to apply equally well to either method. Our aim is to provide definitions, interpretations, examples, and references that will serve as resources for understanding and extending the application of SVD and PCA to gene expression analysis.
73 - P. Tandon 2016
Performance of nuclear threat detection systems based on gamma-ray spectrometry often strongly depends on the ability to identify the part of measured signal that can be attributed to background radiation. We have successfully applied a method based on Principal Component Analysis (PCA) to obtain a compact null-space model of background spectra using PCA projection residuals to derive a source detection score. We have shown the methods utility in a threat detection system using mobile spectrometers in urban scenes (Tandon et al 2012). While it is commonly assumed that measured photon counts follow a Poisson process, standard PCA makes a Gaussian assumption about the data distribution, which may be a poor approximation when photon counts are low. This paper studies whether and in what conditions PCA with a Poisson-based loss function (Poisson PCA) can outperform standard Gaussian PCA in modeling background radiation to enable more sensitive and specific nuclear threat detection.
We show how to efficiently project a vector onto the top principal components of a matrix, without explicitly computing these components. Specifically, we introduce an iterative algorithm that provably computes the projection using few calls to any black-box routine for ridge regression. By avoiding explicit principal component analysis (PCA), our algorithm is the first with no runtime dependence on the number of top principal components. We show that it can be used to give a fast iterative method for the popular principal component regression problem, giving the first major runtime improvement over the naive method of combining PCA with regression. To achieve our results, we first observe that ridge regression can be used to obtain a smooth projection onto the top principal components. We then sharpen this approximation to true projection using a low-degree polynomial approximation to the matrix step function. Step function approximation is a topic of long-term interest in scientific computing. We extend prior theory by constructing polynomials with simple iterative structure and rigorously analyzing their behavior under limited precision.
Principal Component Analysis (PCA) is a common multivariate statistical analysis method, and Probabilistic Principal Component Analysis (PPCA) is its probabilistic reformulation under the framework of Gaussian latent variable model. To improve the robustness of PPCA, it has been proposed to change the underlying Gaussian distributions to multivariate $t$-distributions. Based on the representation of $t$-distribution as a scale mixture of Gaussians, a hierarchical model is used for implementation. However, although the robust PPCA methods work reasonably well for some simulation studies and real data, the hierarchical model implemented does not yield the equivalent interpretation. In this paper, we present a set of equivalent relationships between those models, and discuss the performance of robust PPCA methods using different multivariate $t$-distributed structures through several simulation studies. In doing so, we clarify a current misrepresentation in the literature, and make connections between a set of hierarchical models for robust PPCA.
173 - Simona Cocco 2011
We consider the problem of inferring the interactions between a set of N binary variables from the knowledge of their frequencies and pairwise correlations. The inference framework is based on the Hopfield model, a special case of the Ising model where the interaction matrix is defined through a set of patterns in the variable space, and is of rank much smaller than N. We show that Maximum Lik elihood inference is deeply related to Principal Component Analysis when the amp litude of the pattern components, xi, is negligible compared to N^1/2. Using techniques from statistical mechanics, we calculate the corrections to the patterns to the first order in xi/N^1/2. We stress that it is important to generalize the Hopfield model and include both attractive and repulsive patterns, to correctly infer networks with sparse and strong interactions. We present a simple geometrical criterion to decide how many attractive and repulsive patterns should be considered as a function of the sampling noise. We moreover discuss how many sampled configurations are required for a good inference, as a function of the system size, N and of the amplitude, xi. The inference approach is illustrated on synthetic and biological data.
comments
Fetching comments Fetching comments
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا