No Arabic abstract
This review outlines concepts of mathematical statistics, elements of probability theory, hypothesis tests and point estimation for use in the analysis of modern astronomical data. Least squares, maximum likelihood, and Bayesian approaches to statistical inference are treated. Resampling methods, particularly the bootstrap, provide valuable procedures when distributions functions of statistics are not known. Several approaches to model selection and good- ness of fit are considered. Applied statistics relevant to astronomical research are briefly discussed: nonparametric methods for use when little is known about the behavior of the astronomical populations or processes; data smoothing with kernel density estimation and nonparametric regression; unsupervised clustering and supervised classification procedures for multivariate problems; survival analysis for astronomical datasets with nondetections; time- and frequency-domain times series analysis for light curves; and spatial statistics to interpret the spatial distributions of points in low dimensions. Two types of resources are presented: about 40 recommended texts and monographs in various fields of statistics, and the public domain R software system for statistical analysis. Together with its sim 3500 (and growing) add-on CRAN packages, R implements a vast range of statistical procedures in a coherent high-level language with advanced graphics.
Celestial objects exhibit a wide range of variability in brightness at different wavebands. Surprisingly, the most common methods for characterizing time series in statistics -- parametric autoregressive modeling -- is rarely used to interpret astronomical light curves. We review standard ARMA, ARIMA and ARFIMA (autoregressive moving average fractionally integrated) models that treat short-memory autocorrelation, long-memory $1/f^alpha$ `red noise, and nonstationary trends. Though designed for evenly spaced time series, moderately irregular cadences can be treated as evenly-spaced time series with missing data. Fitting algorithms are efficient and software implementations are widely available. We apply ARIMA models to light curves of four variable stars, discussing their effectiveness for different temporal characteristics. A variety of extensions to ARIMA are outlined, with emphasis on recently developed continuous-time models like CARMA and CARFIMA designed for irregularly spaced time series. Strengths and weakness of ARIMA-type modeling for astronomical data analysis and astrophysical insights are reviewed.
High-throughput metabolomics investigations, when conducted in large human cohorts, represent a potentially powerful tool for elucidating the biochemical diversity and mechanisms underlying human health and disease. Large-scale metabolomics data, generated using targeted or nontargeted platforms, are increasingly more common. Appropriate statistical analysis of these complex high-dimensional data is critical for extracting meaningful results from such large-scale human metabolomics studies. Herein, we consider the main statistical analytical approaches that have been employed in human metabolomics studies. Based on the lessons learned and collective experience to date in the field, we propose a step-by-step framework for pursuing statistical analyses of human metabolomics data. We discuss the range of options and potential approaches that may be employed at each stage of data management, analysis, and interpretation, and offer guidance on analytical considerations that are important for implementing an analysis workflow. Certain pervasive analytical challenges facing human metabolomics warrant ongoing research. Addressing these challenges will allow for more standardization in the field and lead to analytical advances in metabolomics investigations with the potential to elucidate novel mechanisms underlying human health and disease.
Background. Emerging technologies now allow for mass spectrometry based profiling of up to thousands of small molecule metabolites (metabolomics) in an increasing number of biosamples. While offering great promise for revealing insight into the pathogenesis of human disease, standard approaches have yet to be established for statistically analyzing increasingly complex, high-dimensional human metabolomics data in relation to clinical phenotypes including disease outcomes. To determine optimal statistical approaches for metabolomics analysis, we sought to formally compare traditional statistical as well as newer statistical learning methods across a range of metabolomics dataset types. Results. In simulated and experimental metabolomics data derived from large population-based human cohorts, we observed that with an increasing number of study subjects, univariate compared to multivariate methods resulted in a higher false discovery rate due to substantial correlations among metabolites. In scenarios wherein the number of assayed metabolites increases, as in the application of nontargeted versus targeted metabolomics measures, multivariate methods performed especially favorably across a range of statistical operating characteristics. In nontargeted metabolomics datasets that included thousands of metabolite measures, sparse multivariate models demonstrated greater selectivity and lower potential for spurious relationships. Conclusion. When the number of metabolites was similar to or exceeded the number of study subjects, as is common with nontargeted metabolomics analysis of relatively small sized cohorts, sparse multivariate models exhibited the most robust statistical power with more consistent results. These findings have important implications for the analysis of metabolomics studies of human disease.
Amateur contributions to professional publications have increased exponentially over the last decades in the field of Planetary Astronomy. Here we review the different domains of the field in which collaborations between professional and amateur astronomers are effective and regularly lead to scientific publications. We discuss the instruments, detectors, softwares and methodologies typically used by amateur astronomers to collect the scientific data in the different domains of interest. Amateur contributions to the monitoring of planets and interplanetary matter, characterization of asteroids and comets, as well as the determination of the physical properties of Kuiper Belt Objects and exoplanets are discussed.
This paper proposed a new probability distribution named as inverse xgamma distribution (IXGD). Different mathematical and statistical properties,viz., reliability characteristics, moments, inverse moments, stochastic ordering and order statistics of the proposed distribution have been derived and discussed. The estimation of the parameter of IXGD has been approached by different methods of estimation, namely, maximum likelihood method of estimation (MLE), Least square method of estimation (LSE), Weighted least square method of estimation (WLSE), Cramer-von-Mises method of estimation (CME) and maximum product spacing method of estimation (MPSE). Asymptotic confidence interval (ACI) of the parameter is also obtained. A simulation study has been carried out to compare the performance of the obtained estimators and corresponding ACI in terms of average widths and corresponding coverage probabilities. Finally, two real data sets have been used to demonstrate the applicability of IXGD in real life situations.