No Arabic abstract
Demographic studies of cosmic populations must contend with measurement errors and selection effects. We survey some of the key ideas astronomers have developed to deal with these complications, in the context of galaxy surveys and the literature on corrections for Malmquist and Eddington bias. From the perspective of modern statistics, such corrections arise naturally in the context of multilevel models, particularly in Bayesian treatments of such models: hierarchical Bayesian models. We survey some key lessons from hierarchical Bayesian modeling, including shrinkage estimation, which is closely related to traditional corrections devised by astronomers. We describe a framework for hierarchical Bayesian modeling of cosmic populations, tailored to features of astronomical surveys that are not typical of surveys in other disciplines. This thinned latent marked point process framework accounts for the tie between selection (detection) and measurement in astronomical surveys, treating selection and measurement error effects in a self-consistent manner.
CLEAN, the commonly employed imaging algorithm in radio interferometry, suffers from a number of shortcomings: in its basic version it does not have the concept of diffuse flux, and the common practice of convolving the CLEAN components with the CLEAN beam erases the potential for super-resolution; it does not output uncertainty information; it produces images with unphysical negative flux regions; and its results are highly dependent on the so-called weighting scheme as well as on any human choice of CLEAN masks to guiding the imaging. Here, we present the Bayesian imaging algorithm resolve which solves the above problems and naturally leads to super-resolution. We take a VLA observation of Cygnus~A at four different frequencies and image it with single-scale CLEAN, multi-scale CLEAN and resolve. Alongside the sky brightness distribution resolve estimates a baseline-dependent correction function for the noise budget, the Bayesian equivalent of weighting schemes. We report noise correction factors between 0.4 and 429. The enhancements achieved by resolve come at the cost of higher computational effort.
In record linkage (RL), or exact file matching, the goal is to identify the links between entities with information on two or more files. RL is an important activity in areas including counting the population, enhancing survey frames and data, and conducting epidemiological and follow-up studies. RL is challenging when files are very large, no accurate personal identification (ID) number is present on all files for all units, and some information is recorded with error. Without an unique ID number one must rely on comparisons of names, addresses, dates, and other information to find the links. Latent class models can be used to automatically score the value of information for determining match status. Data for fitting models come from comparisons made within groups of units that pass initial file blocking requirements. Data distributions can vary across blocks. This article examines the use of prior information and hierarchical latent class models in the context of RL.
In the second paper of this series we extend our Bayesian reanalysis of the evidence for a cosmic variation of the fine structure constant to the semi-parametric modelling regime. By adopting a mixture of Dirichlet processes prior for the unexplained errors in each instrumental subgroup of the benchmark quasar dataset we go some way towards freeing our model selection procedure from the apparent subjectivity of a fixed distributional form. Despite the infinite-dimensional domain of the error hierarchy so constructed we are able to demonstrate a recursive scheme for marginal likelihood estimation with prior-sensitivity analysis directly analogous to that presented in Paper I, thereby allowing the robustness of our posterior Bayes factors to hyper-parameter choice and model specification to be readily verified. In the course of this work we elucidate various similarities between unexplained error problems in the seemingly disparate fields of astronomy and clinical meta-analysis, and we highlight a number of sophisticated techniques for handling such problems made available by past research in the latter. It is our hope that the novel approach to semi-parametric model selection demonstrated herein may serve as a useful reference for others exploring this potentially difficult class of error model.
Conventional Type Ia supernova (SN Ia) cosmology analyses currently use a simplistic linear regression of magnitude versus color and light curve shape, which does not model intrinsic SN Ia variations and host galaxy dust as physically distinct effects, resulting in low color-magnitude slopes. We construct a probabilistic generative model for the dusty distribution of extinguished absolute magnitudes and apparent colors as the convolution of a intrinsic SN Ia color-magnitude distribution and a host galaxy dust reddening-extinction distribution. If the intrinsic color-magnitude ($M_B$ vs. $B-V$) slope $beta_{int}$ differs from the host galaxy dust law $R_B$, this convolution results in a specific curve of mean extinguished absolute magnitude vs. apparent color. The derivative of this curve smoothly transitions from $beta_{int}$ in the blue tail to $R_B$ in the red tail of the apparent color distribution. The conventional linear fit approximates this effective curve near the average apparent color, resulting in an apparent slope $beta_{app}$ between $beta_{int}$ and $R_B$. We incorporate these effects into a hierarchical Bayesian statistical model for SN Ia light curve measurements, and analyze a dataset of SALT2 optical light curve fits of 248 nearby SN Ia at z < 0.10. The conventional linear fit obtains $beta_{app} approx 3$. Our model finds a $beta_{int} = 2.3 pm 0.3$ and a distinct dust law of $R_B = 3.8 pm 0.3$, consistent with the average for Milky Way dust, while correcting a systematic distance bias of $sim 0.10$ mag in the tails of the apparent color distribution. Finally, we extend our model to examine the SN Ia luminosity-host mass dependence in terms of intrinsic and dust components.
Many of the data, particularly in medicine and disease mapping are count. Indeed, the under or overdispersion problem in count data distrusts the performance of the classical Poisson model. For taking into account this problem, in this paper, we introduce a new Bayesian structured additive regression model, called gamma count, with enough flexibility in modeling dispersion. Setting convenient prior distributions on the model parameters is a momentous issue in Bayesian statistics that characterize the nature of our uncertainty parameters. Relying on a recently proposed class of penalized complexity priors, motivated from a general set of construction principles, we derive the prior structure. The model can be formulated as a latent Gaussian model, and consequently, we can carry out the fast computation by using the integrated nested Laplace approximation method. We investigate the proposed methodology simulation study. Different expropriate prior distribution are examined to provide reasonable sensitivity analysis. To explain the applicability of the proposed model, we analyzed two real-world data sets related to the larynx mortality cancer in Germany and the handball champions league.