Local Two-Sample Testing: A New Tool for Analysing High-Dimensional Astronomical Data

215 0 0.0 ( 0 )

Download Cite

Added by Peter E. Freeman

Publication date 2017

fields Physics

and research's language is English

Authors P. E. Freeman - I. Kim -

Instrumentation and Methods for Astrophysics

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Modern surveys have provided the astronomical community with a flood of high-dimensional data, but analyses of these data often occur after their projection to lower-dimensional spaces. In this work, we introduce a local two-sample hypothesis test framework that an analyst may directly apply to data in their native space. In this framework, the analyst defines two classes based on a response variable of interest (e.g. higher-mass galaxies versus lower-mass galaxies) and determines at arbitrary points in predictor space whether the local proportions of objects that belong to the two classes significantly differs from the global proportion. Our framework has a potential myriad of uses throughout astronomy; here, we demonstrate its efficacy by applying it to a sample of 2487 i-band-selected galaxies observed by the HST ACS in four of the CANDELS program fields. For each galaxy, we have seven morphological summary statistics along with an estimated stellar mass and star-formation rate. We perform two studies: one in which we determine regions of the seven-dimensional space of morphological statistics where high-mass galaxies are significantly more numerous than low-mass galaxies, and vice-versa, and another study where we use SFR in place of mass. We find that we are able to identify such regions, and show how high-mass/low-SFR regions are associated with concentrated and undisturbed galaxies while galaxies in low-mass/high-SFR regions appear more extended and/or disturbed than their high-mass/low-SFR counterparts.

rate research

A Redistribution Tool for Long-Term Archive of Astronomical Observation Data

112 - Chao Sun , Ce Yu , Chenzhou Cui 2020

Astronomical observation data require long-term preservation, and the rapid accumulation of observation data makes it necessary to consider the cost of long-term archive storage. In addition to low-speed disk-based online storage, optical disk or tape-based offline storage can be used to save costs. However, for astronomical research that requires historical data (particularly time-domain astronomy), the performance and energy consumption of data-accessing techniques cause problems because the requested data (which are organized according to observation time) may be located across multiple storage devices. In this study, we design and develop a tool referred to as AstroLayout to redistribute the observation data using spatial aggregation. The core algorithm uses graph partitioning to generate an optimized data placement according to the original observation data statistics and the target storage system. For the given observation data, AstroLayout can copy the long-term archive in the target storage system in accordance with this placement. An efficiency evaluation shows that AstroLayout can reduce the number of devices activated when responding to data-access requests in time-domain astronomy research. In addition to improving the performance of data-accessing techniques, AstroLayout can also reduce the storage systems power consumption. For enhanced adaptability, it supports storage systems of any media, including optical disks, tapes, and hard disks.

Instrumentation and Methods for Astrophysics

Meta Two-Sample Testing: Learning Kernels for Testing with Limited Data

93 - Feng Liu , Wenkai Xu , Jie Lu 2021

Modern kernel-based two-sample tests have shown great success in distinguishing complex, high-dimensional distributions with appropriate learned kernels. Previous work has demonstrated that this kernel learning procedure succeeds, assuming a considerable number of observed samples from each distribution. In realistic scenarios with very limited numbers of data samples, however, it can be challenging to identify a kernel powerful enough to distinguish complex distributions. We address this issue by introducing the problem of meta two-sample testing (M2ST), which aims to exploit (abundant) auxiliary data on related tasks to find an algorithm that can quickly identify a powerful test on new target tasks. We propose two specific algorithms for this task: a generic scheme which improves over baselines and amore tailored approach which performs even better. We provide both theoretical justification and empirical evidence that our proposed meta-testing schemes out-perform learning kernel-based tests directly from scarce observations, and identify when such schemes will be successful.

Machine Learning Artificial Intelligence Machine Learning

catsHTM - A tool for fast accessing and cross-matching large astronomical catalogs

78 - Maayane T. Soumagnac , Eran O. Ofek 2018

Fast access to large catalogs is required for some astronomical applications. Here we introduce the catsHTM tool, consisting of several large catalogs reformatted into HDF5-based file format, which can be downloaded and used locally. To allow fast access, the catalogs are partitioned into hierarchical triangular meshes and stored in HDF5 files. Several tools are provided to perform efficient cone searches at resolutions spanning from a few arc seconds to degrees, within a few milliseconds time. The first released version includes the following catalogs (by alphabetical order): 2MASS, 2MASS extended sources, AKARI, APASS, Cosmos, DECaLS/DR5, FIRST, GAIA/DR1, GAIA/DR2, GALEX/DR6Plus7, HSC/v2, IPHAS/DR2, NED redshifts, NVSS, Pan-STARRS1/DR1, PTF photometric catalog, ROSAT faint source, SDSS sources, SDSS/DR14 spectroscopy, Spitzer/SAGE, Spitzer/IRAC galactic center, UCAC4, UKIDSS/DR10, VST/ATLAS/DR3, VST/KiDS/DR3, WISE and XMM. We provide Python code that allows to perform cone searches, as well as MATLAB code for performing cone searches, catalog cross-matching, general searches, as well as load and create these catalogs.

Instrumentation and Methods for Astrophysics

A new tool for two-dimensional field-reversed configuration equilibrium study

382 - Haojie Ma , Huasheng Xie , Bihe Deng 2021

A new tool (GSEQ-FRC) for solving two-dimensional (2D) equilibrium of field-reversed configuration (FRC) based on fixed boundary and free boundary conditions with external coils included is developed. Benefiting from the two-parameter modified rigid rotor (MRR) radial equilibrium model and the numerical approaches presented by [Ma et al, Nucl. Fusion, 61, 036046, 2021], GSEQ-FRC are used to study the equilibrium properties of FRC quantitatively and will be used for fast FRC equilibrium reconstruction. In GSEQ-FRC, the FRC equilibrium can be conveniently determined by two parameters, i.e., the ratio between thermal pressure and magnetic pressure at the seperatrix $beta_s$, and the normalized scrape of layer (SOL) width $delta_s$. Examples with fixed and free boundary conditions are given to demonstrate the capability of GSEQ-FRC in the equilibrium calculations. This new tool is used to quantitatively study the factors affecting the shape of the FRC separatrix, revealing how the FRC changes from racetrack-like to ellipse-like.

Plasma Physics

Two-Sample Tests for High Dimensional Means with Thresholding and Data Transformation

228 - Song Xi Chen , Jun Li , Ping-Shou Zhong 2014

We consider testing for two-sample means of high dimensional populations by thresholding. Two tests are investigated, which are designed for better power performance when the two population mean vectors differ only in sparsely populated coordinates. The first test is constructed by carrying out thresholding to remove the non-signal bearing dimensions. The second test combines data transformation via the precision matrix with the thresholding. The benefits of the thresholding and the data transformations are showed by a reduced variance of the test thresholding statistics, the improved power and a wider detection region of the tests. Simulation experiments and an empirical study are performed to confirm the theoretical findings and to demonstrate the practical implementations.

Methodology