No Arabic abstract
In the sport of cricket, variations in a players batting ability can usually be measured on one of two scales. Short-term changes in ability that are observed during a single innings, and long-term changes that are witnessed between matches, over entire playing careers. To measure long-term variations, we derive a Bayesian parametric model that uses a Gaussian process to measure and predict how the batting abilities of international cricketers fluctuate between innings. The model is fitted using nested sampling given its high dimensionality and for ease of model comparison. Generally speaking, the results support an anecdotal description of a typical sporting career. Young players tend to begin their careers with some raw ability, which improves over time as a result of coaching, experience and other external circumstances. Eventually, players reach the peak of their career, after which ability tends to decline. The model provides more accurate quantifications of current and future player batting abilities than traditional cricketing statistics, such as the batting average. The results allow us to identify which players are improving or deteriorating in terms of batting ability, which has practical implications in terms of player comparison, talent identification and team selection policy.
The optical and UV variability of the majority of AGN may be related to the reprocessing of rapidly-changing X-ray emission from a more compact region near the central black hole. Such a reprocessing model would be characterised by lags between X-ray and optical/UV emission due to differences in light travel time. Observationally however, such lag features have been difficult to detect due to gaps in the lightcurves introduced through factors such as source visibility or limited telescope time. In this work, Gaussian process regression is employed to interpolate the gaps in the Swift X-ray and UV lightcurves of the narrow-line Seyfert 1 galaxy Mrk 335. In a simulation study of five commonly-employed analytic Gaussian process kernels, we conclude that the Matern 1/2 and rational quadratic kernels yield the most well-specified models for the X-ray and UVW2 bands of Mrk 335. In analysing the structure functions of the Gaussian process lightcurves, we obtain a broken power law with a break point at 125 days in the UVW2 band. In the X-ray band, the structure function of the Gaussian process lightcurve is consistent with a power law in the case of the rational quadratic kernel whilst a broken power law with a breakpoint at 66 days is obtained from the Matern 1/2 kernel. The subsequent cross-correlation analysis is consistent with previous studies and furthermore, shows tentative evidence for a broad X-ray-UV lag feature of up to 30 days in the lag-frequency spectrum where the significance of the lag depends on the choice of Gaussian process kernel.
Academic fields exhibit substantial levels of gender segregation. To date, most attempts to explain this persistent global phenomenon have relied on limited cross-sections of data from specific countries, fields, or career stages. Here we used a global longitudinal dataset assembled from profiles on ORCID.org to investigate which characteristics of a field predict gender differences among the academics who leave and join that field. Only two field characteristics consistently predicted such differences: (1) the extent to which a field values raw intellectual talent (brilliance) and (2) whether a field is in Science, Technology, Engineering, and Mathematics (STEM). Women more than men moved away from brilliance-oriented and STEM fields, and men more than women moved toward these fields. Our findings suggest that stereotypes associating brilliance and other STEM-relevant traits with men more than women play a key role in maintaining gender segregation across academia.
Atmospheric trace-gas inversion is the procedure by which the sources and sinks of a trace gas are identified from observations of its mole fraction at isolated locations in space and time. This is inherently a spatio-temporal bivariate inversion problem, since the mole-fraction field evolves in space and time and the flux is also spatio-temporally distributed. Further, the bivariate model is likely to be non-Gaussian since the flux field is rarely Gaussian. Here, we use conditioning to construct a non-Gaussian bivariate model, and we describe some of its properties through auto- and cross-cumulant functions. A bivariate non-Gaussian, specifically trans-Gaussian, model is then achieved through the use of Box--Cox transformations, and we facilitate Bayesian inference by approximating the likelihood in a hierarchical framework. Trace-gas inversion, especially at high spatial resolution, is frequently highly sensitive to prior specification. Therefore, unlike conventional approaches, we assimilate trace-gas inventory information with the observational data at the parameter layer, thus shifting prior sensitivity from the inventory itself to its spatial characteristics (e.g., its spatial length scale). We demonstrate the approach in controlled-experiment studies of methane inversion, using fluxes extracted from inventories of the UK and Ireland and of Northern Australia.
The coronavirus disease 2019 (COVID-19) global pandemic has led many countries to impose unprecedented lockdown measures in order to slow down the outbreak. Questions on whether governments have acted promptly enough, and whether lockdown measures can be lifted soon have since been central in public discourse. Data-driven models that predict COVID-19 fatalities under different lockdown policy scenarios are essential for addressing these questions and informing governments on future policy directions. To this end, this paper develops a Bayesian model for predicting the effects of COVID-19 lockdown policies in a global context -- we treat each country as a distinct data point, and exploit variations of policies across countries to learn country-specific policy effects. Our model utilizes a two-layer Gaussian process (GP) prior -- the lower layer uses a compartmental SEIR (Susceptible, Exposed, Infected, Recovered) model as a prior mean function with country-and-policy-specific parameters that capture fatality curves under counterfactual policies within each country, whereas the upper layer is shared across all countries, and learns lower-layer SEIR parameters as a function of a countrys features and its policy indicators. Our model combines the solid mechanistic foundations of SEIR models (Bayesian priors) with the flexible data-driven modeling and gradient-based optimization routines of machine learning (Bayesian posteriors) -- i.e., the entire model is trained end-to-end via stochastic variational inference. We compare the projections of COVID-19 fatalities by our model with other models listed by the Center for Disease Control (CDC), and provide scenario analyses for various lockdown and reopening strategies highlighting their impact on COVID-19 fatalities.
Probabilistic regression models typically use the Maximum Likelihood Estimation or Cross-Validation to fit parameters. Unfortunately, these methods may give advantage to the solutions that fit observations in average, but they do not pay attention to the coverage and the width of Prediction Intervals. In this paper, we address the question of adjusting and calibrating Prediction Intervals for Gaussian Processes Regression. First we determine the models parameters by a standard Cross-Validation or Maximum Likelihood Estimation method then we adjust the parameters to assess the optimal type II Coverage Probability to a nominal level. We apply a relaxation method to choose parameters that minimize the Wasserstein distance between the Gaussian distribution of the initial parameters (Cross-Validation or Maximum Likelihood Estimation) and the proposed Gaussian distribution among the set of parameters that achieved the desired Coverage Probability.