The United States Department of Agricultures National Agricultural Statistics Service (NASS) conducts the June Agricultural Survey (JAS) annually. Substantial misclassification occurs during the pre-screening process and from field-estimating farm status for non-response and inaccessible records, resulting in a biased estimate of the number of US farms from the JAS. Here the Annual Land Utilization Survey (ALUS) is proposed as a follow-on survey to the JAS to adjust the estimates of the number of US farms and other important variables. A three-phase survey design-based estimator is developed for the JAS-ALUS with non-response adjustment for the second phase (ALUS). A design-unbiased estimator of the variance is provided in explicit form.
The cyclical and heterogeneous nature of many substance use disorders highlights the need to adapt the type or the dose of treatment to accommodate the specific and changing needs of individuals. The Adaptive Treatment for Alcohol and Cocaine Dependence study (ENGAGE) is a multi-stage randomized trial that aimed to provide longitudinal data for constructing treatment strategies to improve patients engagement in therapy. However, the high rate of noncompliance and lack of analytic tools to account for noncompliance have impeded researchers from using the data to achieve the main goal of the trial. We overcome this issue by defining our target parameter as the mean outcome under different treatment strategies for given potential compliance strata and propose a Bayesian semiparametric model to estimate this quantity. While it adds substantial complexities to the analysis, one important feature of our work is that we consider partial rather than binary compliance classes which is more relevant in longitudinal studies. We assess the performance of our method through comprehensive simulation studies. We illustrate its application on the ENGAGE study and demonstrate that the optimal treatment strategy depends on compliance strata.
This paper considers the instrumental variable quantile regression model (Chernozhukov and Hansen, 2005, 2013) with a binary endogenous treatment. It offers two identification results when the treatment status is not directly observed. The first result is that, remarkably, the reduced-form quantile regression of the outcome variable on the instrumental variable provides a lower bound on the structural quantile treatment effect under the stochastic monotonicity condition (Small and Tan, 2007; DiNardo and Lee, 2011). This result is relevant, not only when the treatment variable is subject to misclassification, but also when any measurement of the treatment variable is not available. The second result is for the structural quantile function when the treatment status is measured with error; I obtain the sharp identified set by deriving moment conditions under widely-used assumptions on the measurement error. Furthermore, I propose an inference method in the presence of other covariates.
The problem of adaptive sampling for estimating probability mass functions (pmf) uniformly well is considered. Performance of the sampling strategy is measured in terms of the worst-case mean squared error. A Bayesian variant of the existing upper confidence bound (UCB) based approaches is proposed. It is shown analytically that the performance of this Bayesian variant is no worse than the existing approaches. The posterior distribution on the pmfs in the Bayesian setting allows for a tighter computation of upper confidence bounds which leads to significant performance gains in practice. Using this approach, adaptive sampling protocols are proposed for estimating SARS-CoV-2 seroprevalence in various groups such as location and ethnicity. The effectiveness of this strategy is discussed using data obtained from a seroprevalence survey in Los Angeles county.
Heterogeneity is an important feature of modern data sets and a central task is to extract information from large-scale and heterogeneous data. In this paper, we consider multiple high-dimensional linear models and adopt the definition of maximin effect (Meinshausen, B{u}hlmann, AoS, 43(4), 1801--1830) to summarize the information contained in this heterogeneous model. We define the maximin effect for a targeted population whose covariate distribution is possibly different from that of the observed data. We further introduce a ridge-type maximin effect to simultaneously account for reward optimality and statistical stability. To identify the high-dimensional maximin effect, we estimate the regression covariance matrix by a debiased estimator and use it to construct the aggregation weights for the maximin effect. A main challenge for statistical inference is that the estimated weights might have a mixture distribution and the resulted maximin effect estimator is not necessarily asymptotic normal. To address this, we devise a novel sampling approach to construct the confidence interval for any linear contrast of high-dimensional maximin effects. The coverage and precision properties of the proposed confidence interval are studied. The proposed method is demonstrated over simulations and a genetic data set on yeast colony growth under different environments.
In this paper we study the impact of exposure misclassification when cluster size is potentially informative (i.e., related to outcomes) and when misclassification is differential by cluster size. First, we show that misclassification in an exposure related to cluster size can induce informativeness when cluster size would otherwise be non-informative. Second, we show that misclassification that is differential by informative cluster size can not only attenuate estimates of exposure effects but even inflate or reverse the sign of estimates. To correct for bias in estimating marginal parameters, we propose two frameworks: (i) an observed likelihood approach for joint marginalized models of cluster size and outcomes and (ii) an expected estimating equations approach. Although we focus on estimating marginal parameters, a corollary is that the observed likelihood approach permits valid inference for conditional parameters as well. Using data from the Nurses Health Study II, we compare the results of the proposed correction methods when applied to motivating data on the multigenerational effect of in-utero diethylstilbestrol exposure on attention-deficit/hyperactivity disorder in 106,198 children of 47,450 nurses.