The problem of adaptive sampling for estimating probability mass functions (pmf) uniformly well is considered. Performance of the sampling strategy is measured in terms of the worst-case mean squared error. A Bayesian variant of the existing upper confidence bound (UCB) based approaches is proposed. It is shown analytically that the performance of this Bayesian variant is no worse than the existing approaches. The posterior distribution on the pmfs in the Bayesian setting allows for a tighter computation of upper confidence bounds which leads to significant performance gains in practice. Using this approach, adaptive sampling protocols are proposed for estimating SARS-CoV-2 seroprevalence in various groups such as location and ethnicity. The effectiveness of this strategy is discussed using data obtained from a seroprevalence survey in Los Angeles county.
In this paper, the method UCB-RS, which resorts to recommendation system (RS) for enhancing the upper-confidence bound algorithm UCB, is presented. The proposed method is used for dealing with non-stationary and large-state spaces multi-armed bandit problems. The proposed method has been targeted to the problem of the product recommendation in the online advertising. Through extensive testing with RecoGym, an OpenAI Gym-based reinforcement learning environment for the product recommendation in online advertising, the proposed method outperforms the widespread reinforcement learning schemes such as $epsilon$-Greedy, Upper Confidence (UCB1) and Exponential Weights for Exploration and Exploitation (EXP3).
We consider the problem of estimating the rate of defects (mean number of defects per item), given the counts of defects detected by two independent imperfect inspectors on one sample of items. In contrast with the setting for the well-known method of Capture-Recapture, we {it{do not}} have information regarding the number of defects jointly detected by {it{both}} inspectors. We solve this problem by constructing two types of estimators - a simple moment-type estimator, and a complicated maximum-likelihood estimator. The performance of these estimators is studied analytically and by means of simulations. It is shown that the maximum-likelihood estimator is superior to the moment-type estimator. A systematic comparison with the Capture-Recapture method is also made.
The United States Department of Agricultures National Agricultural Statistics Service (NASS) conducts the June Agricultural Survey (JAS) annually. Substantial misclassification occurs during the pre-screening process and from field-estimating farm status for non-response and inaccessible records, resulting in a biased estimate of the number of US farms from the JAS. Here the Annual Land Utilization Survey (ALUS) is proposed as a follow-on survey to the JAS to adjust the estimates of the number of US farms and other important variables. A three-phase survey design-based estimator is developed for the JAS-ALUS with non-response adjustment for the second phase (ALUS). A design-unbiased estimator of the variance is provided in explicit form.
We consider the problem of constructing Bayesian based confidence sets for linear functionals in the inverse Gaussian white noise model. We work with a scale of Gaussian priors indexed by a regularity hyper-parameter and apply the data-driven (slightly modified) marginal likelihood empirical Bayes method for the choice of this hyper-parameter. We show by theory and simulations that the credible sets constructed by this method have sub-optimal behaviour in general. However, by assuming self-similarity the credible sets have rate-adaptive size and optimal coverage. As an application of these results we construct $L_{infty}$-credible bands for the true functional parameter with adaptive size and optimal coverage under self-similarity constraint.
Ensemble learning is a mainstay in modern data science practice. Conventional ensemble algorithms assign to base models a set of deterministic, constant model weights that (1) do not fully account for individual models varying accuracy across data subgroups, nor (2) provide uncertainty estimates for the ensemble prediction. These shortcomings can yield predictions that are precise but biased, which can negatively impact the performance of the algorithm in real-word applications. In this work, we present an adaptive, probabilistic approach to ensemble learning using a transformed Gaussian process as a prior for the ensemble weights. Given input features, our method optimally combines base models based on their predictive accuracy in the feature space, and provides interpretable estimates of the uncertainty associated with both model selection, as reflected by the ensemble weights, and the overall ensemble predictions. Furthermore, to ensure that this quantification of the model uncertainty is accurate, we propose additional machinery to non-parametrically model the ensembles predictive cumulative density function (CDF) so that it is consistent with the empirical distribution of the data. We apply the proposed method to data simulated from a nonlinear regression model, and to generate a spatial prediction model and associated prediction uncertainties for fine particle levels in eastern Massachusetts, USA.