ترغب بنشر مسار تعليمي؟ اضغط هنا

Elastic Maps and Nets for Approximating Principal Manifolds and Their Application to Microarray Data Visualization

62   0   0.0 ( 0 )
 نشر من قبل Alexander Gorban
 تاريخ النشر 2007
  مجال البحث فيزياء
والبحث باللغة English




اسأل ChatGPT حول البحث

Principal manifolds are defined as lines or surfaces passing through ``the middle of data distribution. Linear principal manifolds (Principal Components Analysis) are routinely used for dimension reduction, noise filtering and data visualization. Recently, methods for constructing non-linear principal manifolds were proposed, including our elastic maps approach which is based on a physical analogy with elastic membranes. We have developed a general geometric framework for constructing ``principal objects of various dimensions and topologies with the simplest quadratic form of the smoothness penalty which allows very effective parallel implementations. Our approach is implemented in three programming languages (C++, Java and Delphi) with two graphical user interfaces (VidaExpert http://bioinfo.curie.fr/projects/vidaexpert and ViMiDa http://bioinfo-out.curie.fr/projects/vimida applications). In this paper we overview the method of elastic maps and present in detail one of its major applications: the visualization of microarray data in bioinformatics. We show that the method of elastic maps outperforms linear PCA in terms of data approximation, representation of between-point distance structure, preservation of local point neighborhood and representing point classes in low-dimensional spaces.

قيم البحث

اقرأ أيضاً

Multidimensional data distributions can have complex topologies and variable local dimensions. To approximate complex data, we propose a new type of low-dimensional ``principal object: a principal cubic complex. This complex is a generalization of li near and non-linear principal manifolds and includes them as a particular case. To construct such an object, we combine a method of topological grammars with the minimization of an elastic energy defined for its embedment into multidimensional data space. The whole complex is presented as a system of nodes and springs and as a product of one-dimensional continua (represented by graphs), and the grammars describe how these continua transform during the process of optimal complex construction. The simplest case of a topological grammar (``add a node, ``bisect an edge) is equivalent to the construction of ``principal trees, an object useful in many practical applications. We demonstrate how it can be applied to the analysis of bacterial genomes and for visualization of cDNA microarray data using the ``metro map representation. The preprint is supplemented by animation: ``How the topological grammar constructs branching principal components (AnimatedBranchingPCA.gif).
93 - D. J. Mikkelson 2002
The overall design of the Integrated Spectral Analysis Workbench (ISAW), being developed at Argonne, provides for an extensible, highly interactive, collaborating set of viewers for neutron scattering data. Large arbitrary collections of spectra from multiple detectors can be viewed as an image, a scrolled list of individual graphs, or using a 3D representation of the instrument showing the detector positions. Data from an area detector can be displayed using a contour or intensity map as well as an interactive table. Selected spectra can be displayed in tables or on a conventional graph. A unique characteristic of these viewers is their interactivity and coordination. The position pointed at by the user in one viewer is sent to other viewers of the same DataSet so they can track the position and display relevant information. Specialized viewers for single crystal neutron diffractometers are being developed. A proof-of-concept viewer that directly displays the 3D reciprocal lattice from a complete series of runs on a single crystal diffractometer has been implemented.
The Mantid framework is a software solution developed for the analysis and visualization of neutron scattering and muon spin measurements. The framework is jointly developed by software engineers and scientists at the ISIS Neutron and Muon Facility a nd the Oak Ridge National Laboratory. The objectives, functionality and novel design aspects of Mantid are described.
ROOT is an object-oriented C++ framework conceived in the high-energy physics (HEP) community, designed for storing and analyzing petabytes of data in an efficient way. Any instance of a C++ class can be stored into a ROOT file in a machine-independe nt compressed binary format. In ROOT the TTree object container is optimized for statistical data analysis over very large data sets by using vertical data storage techniques. These containers can span a large number of files on local disks, the web, or a number of different shared file systems. In order to analyze this data, the user can chose out of a wide set of mathematical and statistical functions, including linear algebra classes, numerical algorithms such as integration and minimization, and various methods for performing regression analysis (fitting). In particular, ROOT offers packages for complex data modeling and fitting, as well as multivariate classification based on machine learning techniques. A central piece in these analysis tools are the histogram classes which provide binning of one- and multi-dimensional data. Results can be saved in high-quality graphical formats like Postscript and PDF or in bitmap formats like JPG or GIF. The result can also be stored into ROOT macros that allow a full recreation and rework of the graphics. Users typically create their analysis macros step by step, making use of the interactive C++ interpreter CINT, while running over small data samples. Once the development is finished, they can run these macros at full compiled speed over large data sets, using on-the-fly compilation, or by creating a stand-alone batch program. Finally, if processing farms are available, the user can reduce the execution time of intrinsically parallel tasks - e.g. data mining in HEP - by using PROOF, which will take care of optimally distributing the work over the available resources in a transparent way.
A principal component analysis (PCA) of clean microcalorimeter pulse records can be a first step beyond statistically optimal linear filtering of pulses towards a fully non-linear analysis. For PCA to be practical on spectrometers with hundreds of se nsors, an automated identification of clean pulses is required. Robust forms of PCA are the subject of active research in machine learning. We examine a version known as coherence pursuit that is simple, fast, and well matched to the automatic identification of outlier records, as needed for microcalorimeter pulse analysis.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا