ترغب بنشر مسار تعليمي؟ اضغط هنا

117 - M. Ball 2012
This document illustrates the technical layout and the expected performance of a Time Projection Chamber as the central tracking system of the PANDA experiment. The detector is based on a continuously operating TPC with Gas Electron Multiplier (GEM) amplification.
We review the current state of data mining and machine learning in astronomy. Data Mining can have a somewhat mixed connotation from the point of view of a researcher in this field. If used correctly, it can be a powerful approach, holding the potent ial to fully exploit the exponentially increasing amount of available data, promising great scientific advance. However, if misused, it can be little more than the black-box application of complex computing algorithms that may give little physical insight, and provide questionable results. Here, we give an overview of the entire data mining process, from data collection through to the interpretation of results. We cover common machine learning algorithms, such as artificial neural networks and support vector machines, applications from a broad range of astronomy, emphasizing those where data mining techniques directly resulted in improved science, and important current and future directions, including probability density functions, parallel algorithms, petascale computing, and the time domain. We conclude that, so long as one carefully selects an appropriate algorithm, and is guided by the astronomical problem at hand, data mining can be very much the powerful tool, and not the questionable black box.
83 - Adam D Myers 2009
The use of photometric redshifts in cosmology is increasing. Often, however these photo-zs are treated like spectroscopic observations, in that the peak of the photometric redshift, rather than the full probability density function (PDF), is used. Th is overlooks useful information inherent in the full PDF. We introduce a new real-space estimator for one of the most used cosmological statistics, the 2-point correlation function, that weights by the PDF of individual photometric objects in a manner that is optimal when Poisson statistics dominate. As our estimator does not bin based on the PDF peak it substantially enhances the clustering signal by usefully incorporating information from all photometric objects that overlap the redshift bin of interest. As a real-world application, we measure QSO clustering in the Sloan Digital Sky Survey (SDSS). We find that our simplest binned estimator improves the clustering signal by a factor equivalent to increasing the survey size by a factor of 2-3. We also introduce a new implementation that fully weights between pairs of objects in constructing the cross-correlation and find that this pair-weighted estimator improves clustering signal in a manner equivalent to increasing the survey size by a factor of 4-5. Our technique uses spectroscopic data to anchor the distance scale and it will be particularly useful where spectroscopic data (e.g, from BOSS) overlaps deeper photometry (e.g.,from Pan-STARRS, DES or the LSST). We additionally provide simple, informative expressions to determine when our estimator will be competitive with the autocorrelation of spectroscopic objects. Although we use QSOs as an example population, our estimator can and should be applied to any clustering estimate that uses photometric objects.
We present recent results from the LCDM (Laboratory for Cosmological Data Mining; http://lcdm.astro.uiuc.edu) collaboration between UIUC Astronomy and NCSA to deploy supercomputing cluster resources and machine learning algorithms for the mining of t erascale astronomical datasets. This is a novel application in the field of astronomy, because we are using such resources for data mining, and not just performing simulations. Via a modified implementation of the NCSA cyberenvironment Data-to-Knowledge, we are able to provide improved classifications for over 100 million stars and galaxies in the Sloan Digital Sky Survey, improved distance measures, and a full exploitation of the simple but powerful k-nearest neighbor algorithm. A driving principle of this work is that our methods should be extensible from current terascale datasets to upcoming petascale datasets and beyond. We discuss issues encountered to-date, and further issues for the transition to petascale. In particular, disk I/O will become a major limiting factor unless the necessary infrastructure is implemented.
We apply machine learning in the form of a nearest neighbor instance-based algorithm (NN) to generate full photometric redshift probability density functions (PDFs) for objects in the Fifth Data Release of the Sloan Digital Sky Survey (SDSS DR5). We use a conceptually simple but novel application of NN to generate the PDFs - perturbing the object colors by their measurement error - and using the resulting instances of nearest neighbor distributions to generate numerous individual redshifts. When the redshifts are compared to existing SDSS spectroscopic data, we find that the mean value of each PDF has a dispersion between the photometric and spectroscopic redshift consistent with other machine learning techniques, being sigma = 0.0207 +/- 0.0001 for main sample galaxies to r < 17.77 mag, sigma = 0.0243 +/- 0.0002 for luminous red galaxies to r < ~19.2 mag, and sigma = 0.343 +/- 0.005 for quasars to i < 20.3 mag. The PDFs allow the selection of subsets with improved statistics. For quasars, the improvement is dramatic: for those with a single peak in their probability distribution, the dispersion is reduced from 0.343 to sigma = 0.117 +/- 0.010, and the photometric redshift is within 0.3 of the spectroscopic redshift for 99.3 +/- 0.1% of the objects. Thus, for this optical quasar sample, we can virtually eliminate catastrophic photometric redshift estimates. In addition to the SDSS sample, we incorporate ultraviolet photometry from the Third Data Release of the Galaxy Evolution Explorer All-Sky Imaging Survey (GALEX AIS GR3) to create PDFs for objects seen in both surveys. For quasars, the increased coverage of the observed frame UV of the SED results in significant improvement over the full SDSS sample, with sigma = 0.234 +/- 0.010. We demonstrate that this improvement is genuine. [Abridged]
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا