No Arabic abstract
Experimentation has become an increasingly prevalent tool for guiding decision-making and policy choices. A common hurdle in designing experiments is the lack of statistical power. In this paper, we study the optimal multi-period experimental design under the constraint that the treatment cannot be easily removed once implemented; for example, a government might implement a public health intervention in different geographies at different times, where the treatment cannot be easily removed due to practical constraints. The treatment design problem is to select which geographies (referred by units) to treat at which time, intending to test hypotheses about the effect of the treatment. When the potential outcome is a linear function of unit and time effects, and discrete observed/latent covariates, we provide an analytically feasible solution to the optimal treatment design problem where the variance of the treatment effect estimator is at most 1+O(1/N^2) times the variance using the optimal treatment design, where N is the number of units. This solution assigns units in a staggered treatment adoption pattern - if the treatment only affects one period, the optimal fraction of treated units in each period increases linearly in time; if the treatment affects multiple periods, the optimal fraction increases non-linearly in time, smaller at the beginning and larger at the end. In the general setting where outcomes depend on latent covariates, we show that historical data can be utilized in designing experiments. We propose a data-driven local search algorithm to assign units to treatment times. We demonstrate that our approach improves upon benchmark experimental designs via synthetic interventions on the influenza occurrence rate and synthetic experiments on interventions for in-home medical services and grocery expenditure.
Bayesian optimal experimental design (BOED) is a principled framework for making efficient use of limited experimental resources. Unfortunately, its applicability is hampered by the difficulty of obtaining accurate estimates of the expected information gain (EIG) of an experiment. To address this, we introduce several classes of fast EIG estimators by building on ideas from amortized variational inference. We show theoretically and empirically that these estimators can provide significant gains in speed and accuracy over previous approaches. We further demonstrate the practicality of our approach on a number of end-to-end experiments.
The goal of this paper is to make Optimal Experimental Design (OED) computationally feasible for problems involving significant computational expense. We focus exclusively on the Mean Objective Cost of Uncertainty (MOCU), which is a specific methodology for OED, and we propose extensions to MOCU that leverage surrogates and adaptive sampling. We focus on reducing the computational expense associated with evaluating a large set of control policies across a large set of uncertain variables. We propose reducing the computational expense of MOCU by approximating intermediate calculations associated with each parameter/control pair with a surrogate. This surrogate is constructed from sparse sampling and (possibly) refined adaptively through a combination of sensitivity estimation and probabilistic knowledge gained directly from the experimental measurements prescribed from MOCU. We demonstrate our methods on example problems and compare performance relative to surrogate-approximated MOCU with no adaptive sampling and to full MOCU. We find evidence that adaptive sampling does improve performance, but the decision on whether to use surrogate-approximated MOCU versus full MOCU will depend on the relative expense of computation versus experimentation. If computation is more expensive than experimentation, then one should consider using our approach.
Just-in-time adaptive interventions (JITAIs) are time-varying adaptive interventions that use frequent opportunities for the intervention to be adapted such as weekly, daily, or even many times a day. This high intensity of adaptation is facilitated by the ability of digital technology to continuously collect information about an individuals current context and deliver treatments adapted to this information. The micro-randomized trial (MRT) has emerged for use in informing the construction of JITAIs. MRTs operate in, and take advantage of, the rapidly time-varying digital intervention environment. MRTs can be used to address research questions about whether and under what circumstances particular components of a JITAI are effective, with the ultimate objective of developing effective and efficient components. The purpose of this article is to clarify why, when, and how to use MRTs; to highlight elements that must be considered when designing and implementing an MRT; and to discuss the possibilities this emerging optimization trial design offers for future research in the behavioral sciences, education, and other fields. We briefly review key elements of JITAIs, and then describe three case studies of MRTs, each of which highlights research questions that can be addressed using the MRT and experimental design considerations that might arise. We also discuss a variety of considerations that go into planning and designing an MRT, using the case studies as examples.
The hematopoietic system has a highly regulated and complex structure in which cells are organized to successfully create and maintain new blood cells. Feedback regulation is crucial to tightly control this system, but the specific mechanisms by which control is exerted are not completely understood. In this work, we aim to uncover the underlying mechanisms in hematopoiesis by conducting perturbation experiments, where animal subjects are exposed to an external agent in order to observe the system response and evolution. Developing a proper experimental design for these studies is an extremely challenging task. To address this issue, we have developed a novel Bayesian framework for optimal design of perturbation experiments. We model the numbers of hematopoietic stem and progenitor cells in mice that are exposed to a low dose of radiation. We use a differential equations model that accounts for feedback and feedforward regulation. A significant obstacle is that the experimental data are not longitudinal, rather each data point corresponds to a different animal. This model is embedded in a hierarchical framework with latent variables that capture unobserved cellular population levels. We select the optimum design based on the amount of information gain, measured by the Kullback-Leibler divergence between the probability distributions before and after observing the data. We evaluate our approach using synthetic and experimental data. We show that a proper design can lead to better estimates of model parameters even with relatively few subjects. Additionally, we demonstrate that the model parameters show a wide range of sensitivities to design options. Our method should allow scientists to find the optimal design by focusing on their specific parameters of interest and provide insight to hematopoiesis. Our approach can be extended to more complex models where latent components are used.
Inferring the causal structure of a system typically requires interventional data, rather than just observational data. Since interventional experiments can be costly, it is preferable to select interventions that yield the maximum amount of information about a system. We propose a novel Bayesian method for optimal experimental design by sequentially selecting interventions that minimize the expected posterior entropy as rapidly as possible. A key feature is that the method can be implemented by computing simple summaries of the current posterior, avoiding the computationally burdensome task of repeatedly performing posterior inference on hypothetical future datasets drawn from the posterior predictive. After deriving the method in a general setting, we apply it to the problem of inferring causal networks. We present a series of simulation studies in which we find that the proposed method performs favorably compared to existing alternative methods. Finally, we apply the method to real and simulated data from a protein-signaling network.