No Arabic abstract
Modern surveys have provided the astronomical community with a flood of high-dimensional data, but analyses of these data often occur after their projection to lower-dimensional spaces. In this work, we introduce a local two-sample hypothesis test framework that an analyst may directly apply to data in their native space. In this framework, the analyst defines two classes based on a response variable of interest (e.g. higher-mass galaxies versus lower-mass galaxies) and determines at arbitrary points in predictor space whether the local proportions of objects that belong to the two classes significantly differs from the global proportion. Our framework has a potential myriad of uses throughout astronomy; here, we demonstrate its efficacy by applying it to a sample of 2487 i-band-selected galaxies observed by the HST ACS in four of the CANDELS program fields. For each galaxy, we have seven morphological summary statistics along with an estimated stellar mass and star-formation rate. We perform two studies: one in which we determine regions of the seven-dimensional space of morphological statistics where high-mass galaxies are significantly more numerous than low-mass galaxies, and vice-versa, and another study where we use SFR in place of mass. We find that we are able to identify such regions, and show how high-mass/low-SFR regions are associated with concentrated and undisturbed galaxies while galaxies in low-mass/high-SFR regions appear more extended and/or disturbed than their high-mass/low-SFR counterparts.
Astronomical observation data require long-term preservation, and the rapid accumulation of observation data makes it necessary to consider the cost of long-term archive storage. In addition to low-speed disk-based online storage, optical disk or tape-based offline storage can be used to save costs. However, for astronomical research that requires historical data (particularly time-domain astronomy), the performance and energy consumption of data-accessing techniques cause problems because the requested data (which are organized according to observation time) may be located across multiple storage devices. In this study, we design and develop a tool referred to as AstroLayout to redistribute the observation data using spatial aggregation. The core algorithm uses graph partitioning to generate an optimized data placement according to the original observation data statistics and the target storage system. For the given observation data, AstroLayout can copy the long-term archive in the target storage system in accordance with this placement. An efficiency evaluation shows that AstroLayout can reduce the number of devices activated when responding to data-access requests in time-domain astronomy research. In addition to improving the performance of data-accessing techniques, AstroLayout can also reduce the storage systems power consumption. For enhanced adaptability, it supports storage systems of any media, including optical disks, tapes, and hard disks.
Modern kernel-based two-sample tests have shown great success in distinguishing complex, high-dimensional distributions with appropriate learned kernels. Previous work has demonstrated that this kernel learning procedure succeeds, assuming a considerable number of observed samples from each distribution. In realistic scenarios with very limited numbers of data samples, however, it can be challenging to identify a kernel powerful enough to distinguish complex distributions. We address this issue by introducing the problem of meta two-sample testing (M2ST), which aims to exploit (abundant) auxiliary data on related tasks to find an algorithm that can quickly identify a powerful test on new target tasks. We propose two specific algorithms for this task: a generic scheme which improves over baselines and amore tailored approach which performs even better. We provide both theoretical justification and empirical evidence that our proposed meta-testing schemes out-perform learning kernel-based tests directly from scarce observations, and identify when such schemes will be successful.
Fast access to large catalogs is required for some astronomical applications. Here we introduce the catsHTM tool, consisting of several large catalogs reformatted into HDF5-based file format, which can be downloaded and used locally. To allow fast access, the catalogs are partitioned into hierarchical triangular meshes and stored in HDF5 files. Several tools are provided to perform efficient cone searches at resolutions spanning from a few arc seconds to degrees, within a few milliseconds time. The first released version includes the following catalogs (by alphabetical order): 2MASS, 2MASS extended sources, AKARI, APASS, Cosmos, DECaLS/DR5, FIRST, GAIA/DR1, GAIA/DR2, GALEX/DR6Plus7, HSC/v2, IPHAS/DR2, NED redshifts, NVSS, Pan-STARRS1/DR1, PTF photometric catalog, ROSAT faint source, SDSS sources, SDSS/DR14 spectroscopy, Spitzer/SAGE, Spitzer/IRAC galactic center, UCAC4, UKIDSS/DR10, VST/ATLAS/DR3, VST/KiDS/DR3, WISE and XMM. We provide Python code that allows to perform cone searches, as well as MATLAB code for performing cone searches, catalog cross-matching, general searches, as well as load and create these catalogs.
A new tool (GSEQ-FRC) for solving two-dimensional (2D) equilibrium of field-reversed configuration (FRC) based on fixed boundary and free boundary conditions with external coils included is developed. Benefiting from the two-parameter modified rigid rotor (MRR) radial equilibrium model and the numerical approaches presented by [Ma et al, Nucl. Fusion, 61, 036046, 2021], GSEQ-FRC are used to study the equilibrium properties of FRC quantitatively and will be used for fast FRC equilibrium reconstruction. In GSEQ-FRC, the FRC equilibrium can be conveniently determined by two parameters, i.e., the ratio between thermal pressure and magnetic pressure at the seperatrix $beta_s$, and the normalized scrape of layer (SOL) width $delta_s$. Examples with fixed and free boundary conditions are given to demonstrate the capability of GSEQ-FRC in the equilibrium calculations. This new tool is used to quantitatively study the factors affecting the shape of the FRC separatrix, revealing how the FRC changes from racetrack-like to ellipse-like.
We consider testing for two-sample means of high dimensional populations by thresholding. Two tests are investigated, which are designed for better power performance when the two population mean vectors differ only in sparsely populated coordinates. The first test is constructed by carrying out thresholding to remove the non-signal bearing dimensions. The second test combines data transformation via the precision matrix with the thresholding. The benefits of the thresholding and the data transformations are showed by a reduced variance of the test thresholding statistics, the improved power and a wider detection region of the tests. Simulation experiments and an empirical study are performed to confirm the theoretical findings and to demonstrate the practical implementations.