Representing complex data using localized principal components with application to astronomical data

332 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Coryn Bailer-Jones

تاريخ النشر 2007

مجال البحث فيزياء

والبحث باللغة English

تأليف Jochen Einbeck

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Often the relation between the variables constituting a multivariate data space might be characterized by one or more of the terms: ``nonlinear, ``branched, ``disconnected, ``bended, ``curved, ``heterogeneous, or, more general, ``complex. In these cases, simple principal component analysis (PCA) as a tool for dimension reduction can fail badly. Of the many alternative approaches proposed so far, local approximations of PCA are among the most promising. This paper will give a short review of localiz

قيم البحث

75 - Boris Landa , Yoel Shkolnisky 2016

This paper describes a fast and accurate method for obtaining steerable principal components from a large dataset of images, assuming the images are well localized in space and frequency. The obtained steerable principal components are optimal for ex panding the images in the dataset and all of their rotations. The method relies upon first expanding the images using a series of two-dimensional Prolate Spheroidal Wave Functions (PSWFs), where the expansion coefficients are evaluated using a specially designed numerical integration scheme. Then, the expansion coefficients are used to construct a rotationally-invariant covariance matrix which admits a block-diagonal structure, and the eigen-decomposition of its blocks provides us with the desired steerable principal components. The proposed method is shown to be faster then existing methods, while providing appropriate error bounds which guarantee its accuracy.

الرؤية الحاسوبية وتمييز الأنماط التحليل العددي

Identifying complex sources in large astronomical data using a coarse-grained complexity measure

107 - Gary Segal , David Parkinson , Ray P. Norris 2018

The volume of data that will be produced by the next generation of astrophysical instruments represents a significant opportunity for making unplanned and unexpected discoveries. Conversely, finding unexpected objects or phenomena within such large v olumes of data presents a challenge that may best be solved using computational and statistical approaches. We present the application of a coarse-grained complexity measure for identifying interesting observations in large astronomical data sets. This measure, which has been termed apparent complexity, has been shown to model human intuition and perceptions of complexity. Apparent complexity is computationally efficient to derive and can be used to segment and identify interesting observations in very large data sets based on their morphological complexity. We show, using data from the Australia Telescope Large Area Survey, that apparent complexity can be combined with clustering methods to provide an automated process for distinguishing between images of galaxies which have been classified as having simple and complex morphologies. The approach generalizes well when applied to new data after being calibrated on a smaller data set, where it performs better than tested classification methods using pixel data. This generalizability positions apparent complexity as a suitable machine-learning feature for identifying complex observations with unanticipated features.

الأجهزة والأساليب للزيئات الفيزياء الفلكية

Derivative Principal Component Analysis for Representing the Time Dynamics of Longitudinal and Functional Data

157 - Xiongtao Dai , Hans-Georg Muller , Wenwen Tao 2017

We propose a nonparametric method to explicitly model and represent the derivatives of smooth underlying trajectories for longitudinal data. This representation is based on a direct Karhunen--Lo`eve expansion of the unobserved derivatives and leads t o the notion of derivative principal component analysis, which complements functional principal component analysis, one of the most popular tools of functional data analysis. The proposed derivative principal component scores can be obtained for irregularly spaced and sparsely observed longitudinal data, as typically encountered in biomedical studies, as well as for functional data which are densely measured. Novel consistency results and asymptotic convergence rates for the proposed estimates of the derivative principal component scores and other components of the model are derived under a unified scheme for sparse or dense observations and mild conditions. We compare the proposed representations for derivatives with alternative approaches in simulation settings and also in a wallaby growth curve application. It emerges that representations using the proposed derivative principal component analysis recover the underlying derivatives more accurately compared to principal component analysis-based approaches especially in settings where the functional data are represented with only a very small number of components or are densely sampled. In a second wheat spectra classification example, derivative principal component scores were found to be more predictive for the protein content of wheat than the conventional functional principal component scores.

المنهجية نظرية الإحصاء نظرية الإحصاء

The Virtual Astronomical Observatory: Re-engineering Access to Astronomical Data

510 - R. J. Hanisch 2015

The U.S. Virtual Astronomical Observatory was a software infrastructure and development project designed both to begin the establishment of an operational Virtual Observatory (VO) and to provide the U.S. coordination with the international VO effort. The concept of the VO is to provide the means by which an astronomer is able to discover, access, and process data seamlessly, regardless of its physical location. This paper describes the origins of the VAO, including the predecessor efforts within the U.S. National Virtual Observatory, and summarizes its main accomplishments. These accomplishments include the development of both scripting toolkits that allow scientists to incorporate VO data directly into their reduction and analysis environments and high-level science applications for data discovery, integration, analysis, and catalog cross-comparison. Working with the international community, and based on the experience from the software development, the VAO was a major contributor to international standards within the International Virtual Observatory Alliance. The VAO also demonstrated how an operational virtual observatory could be deployed, providing a robust operational environment in which VO services worldwide were routinely checked for aliveness and compliance with international standards. Finally, the VAO engaged in community outreach, developing a comprehensive web site with on-line tutorials, announcements, links to both U.S. and internationally developed tools and services, and exhibits and hands-on training .... All digital products of the VAO Project, including software, documentation, and tutorials, are stored in a repository for community access. The enduring legacy of the VAO is an increasing expectation that new telescopes and facilities incorporate VO capabilities during the design of their data management systems.

الأجهزة والأساليب للزيئات الفيزياء الفلكية

Elastic Maps and Nets for Approximating Principal Manifolds and Their Application to Microarray Data Visualization

194 - A. N. Gorban , A. Y. Zinovyev 2007

Principal manifolds are defined as lines or surfaces passing through ``the middle of data distribution. Linear principal manifolds (Principal Components Analysis) are routinely used for dimension reduction, noise filtering and data visualization. Rec ently, methods for constructing non-linear principal manifolds were proposed, including our elastic maps approach which is based on a physical analogy with elastic membranes. We have developed a general geometric framework for constructing ``principal objects of various dimensions and topologies with the simplest quadratic form of the smoothness penalty which allows very effective parallel implementations. Our approach is implemented in three programming languages (C++, Java and Delphi) with two graphical user interfaces (VidaExpert http://bioinfo.curie.fr/projects/vidaexpert and ViMiDa http://bioinfo-out.curie.fr/projects/vimida applications). In this paper we overview the method of elastic maps and present in detail one of its major applications: the visualization of microarray data in bioinformatics. We show that the method of elastic maps outperforms linear PCA in terms of data approximation, representation of between-point distance structure, preservation of local point neighborhood and representing point classes in low-dimensional spaces.

تحليل البيانات والإحصاءات والاحتمال الفيزياء البيولوجية

سجل دخول لتتمكن من نشر تعليقات