No Arabic abstract
The goal of a well-controlled study is to remove unwanted variation when estimating the causal effect of the intervention of interest. Experiments conducted in the basic sciences frequently achieve this goal using experimental controls, such as negative and positive controls, which are measurements designed to detect systematic sources of unwanted variation. Here, we introduce clear, mathematically precise definitions of experimental controls using potential outcomes. Our definitions provide a unifying statistical framework for fundamental concepts of experimental design from the biological and other basic sciences. These controls are defined in terms of whether assumptions are being made about a specific treatment level, outcome, or contrast between outcomes. We discuss experimental controls as tools for researchers to wield in designing experiments and detecting potential design flaws, including using controls to diagnose unintended factors that influence the outcome of interest, assess measurement error, and identify important subpopulations. We believe that experimental controls are powerful tools for reproducible research that are possibly underutilized by statisticians, epidemiologists, and social science researchers.
Unobserved confounding presents a major threat to the validity of causal inference from observational studies. In this paper, we introduce a novel framework that leverages the information in multiple parallel outcomes for identification and estimation of causal effects. Under a conditional independence structure among multiple parallel outcomes, we achieve nonparametric identification with at least three parallel outcomes. We further show that under a set of linear structural equation models, causal inference is possible with two parallel outcomes. We develop accompanying estimating procedures and evaluate their finite sample performance through simulation studies and a data application studying the causal effect of the tau protein level on various types of behavioral deficits.
Thompson sampling is a popular algorithm for solving multi-armed bandit problems, and has been applied in a wide range of applications, from website design to portfolio optimization. In such applications, however, the number of choices (or arms) $N$ can be large, and the data needed to make adaptive decisions require expensive experimentation. One is then faced with the constraint of experimenting on only a small subset of $K ll N$ arms within each time period, which poses a problem for traditional Thompson sampling. We propose a new Thompson Sampling under Experimental Constraints (TSEC) method, which addresses this so-called arm budget constraint. TSEC makes use of a Bayesian interaction model with effect hierarchy priors, to model correlations between rewards on different arms. This fitted model is then integrated within Thompson sampling, to jointly identify a good subset of arms for experimentation and to allocate resources over these arms. We demonstrate the effectiveness of TSEC in two problems with arm budget constraints. The first is a simulated website optimization study, where TSEC shows noticeable improvements over industry benchmarks. The second is a portfolio optimization application on industry-based exchange-traded funds, where TSEC provides more consistent and greater wealth accumulation over standard investment strategies.
Poverty is a multidimensional concept often comprising a monetary outcome and other welfare dimensions such as education, subjective well-being or health, that are measured on an ordinal scale. In applied research, multidimensional poverty is ubiquitously assessed by studying each poverty dimension independently in univariate regression models or by combining several poverty dimensions into a scalar index. This inhibits a thorough analysis of the potentially varying interdependence between the poverty dimensions. We propose a multivariate copula generalized additive model for location, scale and shape (copula GAMLSS or distributional copula model) to tackle this challenge. By relating the copula parameter to covariates, we specifically examine if certain factors determine the dependence between poverty dimensions. Furthermore, specifying the full conditional bivariate distribution, allows us to derive several features such as poverty risks and dependence measures coherently from one model for different individuals. We demonstrate the approach by studying two important poverty dimensions: income and education. Since the level of education is measured on an ordinal scale while income is continuous, we extend the bivariate copula GAMLSS to the case of mixed ordered-continuous outcomes. The new model is integrated into the GJRM package in R and applied to data from Indonesia. Particular emphasis is given to the spatial variation of the income-education dependence and groups of individuals at risk of being simultaneously poor in both education and income dimensions.
Scientists have been interested in estimating causal peer effects to understand how peoples behaviors are affected by their network peers. However, it is well known that identification and estimation of causal peer effects are challenging in observational studies for two reasons. The first is the identification challenge due to unmeasured network confounding, for example, homophily bias and contextual confounding. The second issue is network dependence of observations, which one must take into account for valid statistical inference. Negative control variables, also known as placebo variables, have been widely used in observational studies including peer effect analysis over networks, although they have been used primarily for bias detection. In this article, we establish a formal framework which leverages a pair of negative control outcome and exposure variables (double negative controls) to nonparametrically identify causal peer effects in the presence of unmeasured network confounding. We then propose a generalized method of moments estimator for causal peer effects, and establish its consistency and asymptotic normality under an assumption about $psi$-network dependence. Finally, we provide a network heteroskedasticity and autocorrelation consistent variance estimator. Our methods are illustrated with an application to peer effects in education.
We develop Bayesian models for density regression with emphasis on discrete outcomes. The problem of density regression is approached by considering methods for multivariate density estimation of mixed scale variables, and obtaining conditional densities from the multivariate ones. The approach to multivariate mixed scale outcome density estimation that we describe represents discrete variables, either responses or covariates, as discretis