No Arabic abstract
Modern genomic studies are increasingly focused on discovering more and more interesting genes associated with a health response. Traditional shrinkage priors are primarily designed to detect a handful of signals from tens and thousands of predictors. Under diverse sparsity regimes, the nature of signal detection is associated with a tail behaviour of a prior. A desirable tail behaviour is called tail-adaptive shrinkage property where tail-heaviness of a prior gets adaptively larger (or smaller) as a sparsity level increases (or decreases) to accommodate more (or less) signals. We propose a global-local-tail (GLT) Gaussian mixture distribution to ensure this property and provide accurate inference under diverse sparsity regimes. Incorporating a peaks-over-threshold method in extreme value theory, we develop an automated tail learning algorithm for the GLT prior. We compare the performance of the GLT prior to the Horseshoe in two gene expression datasets and numerical examples. Results suggest that varying tail rule is advantageous over fixed tail rule under diverse sparsity domains.
In record linkage (RL), or exact file matching, the goal is to identify the links between entities with information on two or more files. RL is an important activity in areas including counting the population, enhancing survey frames and data, and conducting epidemiological and follow-up studies. RL is challenging when files are very large, no accurate personal identification (ID) number is present on all files for all units, and some information is recorded with error. Without an unique ID number one must rely on comparisons of names, addresses, dates, and other information to find the links. Latent class models can be used to automatically score the value of information for determining match status. Data for fitting models come from comparisons made within groups of units that pass initial file blocking requirements. Data distributions can vary across blocks. This article examines the use of prior information and hierarchical latent class models in the context of RL.
Shrinkage prior are becoming more and more popular in Bayesian modeling for high dimensional sparse problems due to its computational efficiency. Recent works show that a polynomially decaying prior leads to satisfactory posterior asymptotics under regression models. In the literature, statisticians have investigated how the global shrinkage parameter, i.e., the scale parameter, in a heavy tail prior affects the posterior contraction. In this work, we explore how the shape of the prior, or more specifically, the polynomial order of the prior tail affects the posterior. We discover that, under the sparse normal means models, the polynomial order does affect the multiplicative constant of the posterior contraction rate. More importantly, if the polynomial order is sufficiently close to 1, it will induce the optimal Bayesian posterior convergence, in the sense that the Bayesian contraction rate is sharply minimax, i.e., not only the order, but also the multiplicative constant of the posterior contraction rate are optimal. The above Bayesian sharp minimaxity holds when the global shrinkage parameter follows a deterministic choice which depends on the unknown sparsity $s$. Therefore, a Beta-prior modeling is further proposed, such that our sharply minimax Bayesian procedure is adaptive to unknown $s$. Our theoretical discoveries are justified by simulation studies.
We consider the problem of variable selection in high-dimensional settings with missing observations among the covariates. To address this relatively understudied problem, we propose a new synergistic procedure -- adaptive Bayesian SLOPE -- which effectively combines the SLOPE method (sorted $l_1$ regularization) together with the Spike-and-Slab LASSO method. We position our approach within a Bayesian framework which allows for simultaneous variable selection and parameter estimation, despite the missing values. As with the Spike-and-Slab LASSO, the coefficients are regarded as arising from a hierarchical model consisting of two groups: (1) the spike for the inactive and (2) the slab for the active. However, instead of assigning independent spike priors for each covariate, here we deploy a joint SLOPE spike prior which takes into account the ordering of coefficient magnitudes in order to control for false discoveries. Through extensive simulations, we demonstrate satisfactory performance in terms of power, FDR and estimation bias under a wide range of scenarios. Finally, we analyze a real dataset consisting of patients from Paris hospitals who underwent a severe trauma, where we show excellent performance in predicting platelet levels. Our methodology has been implemented in C++ and wrapped into an R package ABSLOPE for public use.
We develop singular value shrinkage priors for the mean matrix parameters in the matrix-variate normal model with known covariance matrices. Our priors are superharmonic and put more weight on matrices with smaller singular values. They are a natural generalization of the Stein prior. Bayes estimators and Bayesian predictive densities based on our priors are minimax and dominate those based on the uniform prior in finite samples. In particular, our priors work well when the true value of the parameter has low rank.
Radio tomographic imaging (RTI) is an emerging technology to locate physical objects in a geographical area covered by wireless networks. From the attenuation measurements collected at spatially distributed sensors, radio tomography capitalizes on spatial loss fields (SLFs) measuring the absorption of radio frequency waves at each location along the propagation path. These SLFs can be utilized for interference management in wireless communication networks, environmental monitoring, and survivor localization after natural disaster such as earthquakes. Key to success of RTI is to model accurately the shadowing effects as the bi-dimensional integral of the SLF scaled by a weight function, which is estimated using regularized regression. However, the existing approaches are less effective when the propagation environment is heterogeneous. To cope with this, the present work introduces a piecewise homogeneous SLF governed by a hidden Markov random field (MRF) model. Efficient and tractable SLF estimators are developed by leveraging Markov chain Monte Carlo (MCMC) techniques. Furthermore, an uncertainty sampling method is developed to adaptively collect informative measurements in estimating the SLF. Numerical tests using synthetic and real datasets demonstrate capabilities of the proposed algorithm for radio tomography and channel-gain estimation.