No Arabic abstract
Fast access to large catalogs is required for some astronomical applications. Here we introduce the catsHTM tool, consisting of several large catalogs reformatted into HDF5-based file format, which can be downloaded and used locally. To allow fast access, the catalogs are partitioned into hierarchical triangular meshes and stored in HDF5 files. Several tools are provided to perform efficient cone searches at resolutions spanning from a few arc seconds to degrees, within a few milliseconds time. The first released version includes the following catalogs (by alphabetical order): 2MASS, 2MASS extended sources, AKARI, APASS, Cosmos, DECaLS/DR5, FIRST, GAIA/DR1, GAIA/DR2, GALEX/DR6Plus7, HSC/v2, IPHAS/DR2, NED redshifts, NVSS, Pan-STARRS1/DR1, PTF photometric catalog, ROSAT faint source, SDSS sources, SDSS/DR14 spectroscopy, Spitzer/SAGE, Spitzer/IRAC galactic center, UCAC4, UKIDSS/DR10, VST/ATLAS/DR3, VST/KiDS/DR3, WISE and XMM. We provide Python code that allows to perform cone searches, as well as MATLAB code for performing cone searches, catalog cross-matching, general searches, as well as load and create these catalogs.
The NASA/IPAC Extragalactic Database (NED) has deployed a new rule-based cross-matching algorithm called Match Expert (MatchEx), capable of cross-matching very large catalogs (VLCs) with >10 million objects. MatchEx goes beyond traditional position-based cross-matching algorithms by using other available data together with expert logic to determine which candidate match is the best. Furthermore, the local background density of sources is used to determine and minimize the false-positive match rate and to estimate match completeness. The logical outcome and statistical probability of each match decision is stored in the database, and may be used to tune the algorithm and adjust match parameter thresholds. For our first production run, we cross-matched the GALEX All Sky Survey Catalog (GASC), containing nearly 40 million NUV-detected sources, against a directory of 180 million objects in NED. Candidate matches were identified for each GASC source within a 7.5 arcsecond radius. These candidates were filtered on position-based matching probability, and on other criteria including object type and object name. We estimate a match completeness of 97.6% and a match accuracy of 99.75%. MatchEx is being used to cross-match over 2 billion catalog sources to NED, including the Spitzer Source List, the 2MASS Point-Source Catalog, AllWISE, and SDSS DR 10. It will also speed up routine cross-matching of sources as part of the NED literature pipeline.
Time series data of celestial objects are commonly used to study valuable and unexpected objects such as extrasolar planets and supernova in time domain astronomy. Due to the rapid growth of data volume, traditional manual methods are becoming extremely hard and infeasible for continuously analyzing accumulated observation data. To meet such demands, we designed and implemented a special tool named AstroCatR that can efficiently and flexibly reconstruct time series data from large-scale astronomical catalogues. AstroCatR can load original catalogue data from Flexible Image Transport System (FITS) files or databases, match each item to determine which object it belongs to, and finally produce time series datasets. To support the high-performance parallel processing of large-scale datasets, AstroCatR uses the extract-transform-load (ETL) preprocessing module to create sky zone files and balance the workload. The matching module uses the overlapped indexing method and an in-memory reference table to improve accuracy and performance. The output of AstroCatR can be stored in CSV files or be transformed other into formats as needed. Simultaneously, the module-based software architecture ensures the flexibility and scalability of AstroCatR. We evaluated AstroCatR with actual observation data from The three Antarctic Survey Telescopes (AST3). The experiments demonstrate that AstroCatR can efficiently and flexibly reconstruct all time series data by setting relevant parameters and configuration files. Furthermore, the tool is approximately 3X faster than methods using relational database management systems at matching massive catalogues.
Astronomical observation data require long-term preservation, and the rapid accumulation of observation data makes it necessary to consider the cost of long-term archive storage. In addition to low-speed disk-based online storage, optical disk or tape-based offline storage can be used to save costs. However, for astronomical research that requires historical data (particularly time-domain astronomy), the performance and energy consumption of data-accessing techniques cause problems because the requested data (which are organized according to observation time) may be located across multiple storage devices. In this study, we design and develop a tool referred to as AstroLayout to redistribute the observation data using spatial aggregation. The core algorithm uses graph partitioning to generate an optimized data placement according to the original observation data statistics and the target storage system. For the given observation data, AstroLayout can copy the long-term archive in the target storage system in accordance with this placement. An efficiency evaluation shows that AstroLayout can reduce the number of devices activated when responding to data-access requests in time-domain astronomy research. In addition to improving the performance of data-accessing techniques, AstroLayout can also reduce the storage systems power consumption. For enhanced adaptability, it supports storage systems of any media, including optical disks, tapes, and hard disks.
We present a new parallel implementation of the PINpointing Orbit Crossing-Collapsed HIerarchical Objects (PINOCCHIO) algorithm, a quick tool, based on Lagrangian Perturbation Theory, for the hierarchical build-up of Dark Matter halos in cosmological volumes. To assess its ability to predict halo correlations on large scales, we compare its results with those of an N-body simulation of a 3 Gpc/h box sampled with 2048^3 particles taken from the MICE suite, matching the same seeds for the initial conditions. Thanks to the FFTW libraries and to the relatively simple design, the code shows very good scaling properties. The CPU time required by PINOCCHIO is a tiny fraction (~1/2000) of that required by the MICE simulation. Varying some of PINOCCHIO numerical parameters allows one to produce a universal mass function that lies in the range allowed by published fits, although it underestimates the MICE mass function of FoF halos in the high mass tail. We compare the matter-halo and the halo-halo power spectra with those of the MICE simulation and find that these 2-point statistics are well recovered on large scales. In particular, when catalogs are matched in number density, agreement within ten per cent is achieved for the halo power spectrum. At scales k>0.1 h/Mpc, the inaccuracy of the Zeldovich approximation in locating halo positions causes an underestimate of the power spectrum that can be modeled as a Gaussian factor with a damping scale of d=3 Mpc/h at z=0, decreasing at higher redshift. Finally, a remarkable match is obtained for the reduced halo bispectrum, showing a good description of nonlinear halo bias. Our results demonstrate the potential of PINOCCHIO as an accurate and flexible tool for generating large ensembles of mock galaxy surveys, with interesting applications for the analysis of large galaxy redshift surveys.
I describe a new, open-source astronomical image-fitting program called Imfit, specialized for galaxies but potentially useful for other sources, which is fast, flexible, and highly extensible. A key characteristic of the program is an object-oriented design which allows new types of image components (2D surface-brightness functions) to be easily written and added to the program. Image functions provided with Imfit include the usual suspects for galaxy decompositions (Sersic, exponential, Gaussian), along with Core-Sersic and broken-exponential profiles, elliptical rings, and three components which perform line-of-sight integration through 3D luminosity-density models of disks and rings seen at arbitrary inclinations. Available minimization algorithms include Levenberg-Marquardt, Nelder-Mead simplex, and Differential Evolution, allowing trade-offs between speed and decreased sensitivity to local minima in the fit landscape. Minimization can be done using the standard chi^2 statistic (using either data or model values to estimate per-pixel Gaussian errors, or else user-supplied error images) or Poisson-based maximum-likelihood statistics; the latter approach is particularly appropriate for cases of Poisson data in the low-count regime. I show that fitting low-S/N galaxy images using chi^2 minimization and individual-pixel Gaussian uncertainties can lead to significant biases in fitted parameter values, which are avoided if a Poisson-based statistic is used; this is true even when Gaussian read noise is present.