No Arabic abstract
One central goal of design of observational studies is to embed non-experimental data into an approximate randomized controlled trial using statistical matching. Researchers then make the randomization assumption in their downstream, outcome analysis. For matched pair design, the randomization assumption states that the treatment assignment across all matched pairs are independent, and that the probability of the first subject in each pair receiving treatment and the other control is the same as the first receiving control and the other treatment. In this article, we develop a novel framework for testing the randomization assumption based on solving a clustering problem with side-information using modern statistical learning tools. Our testing framework is nonparametric, finite-sample exact, and distinct from previous proposals in that it can be used to test a relaxed version of the randomization assumption called the biased randomization assumption. One important by-product of our testing framework is a quantity called residual sensitivity value (RSV), which quantifies the level of minimal residual confounding due to observed covariates not being well matched. We advocate taking into account RSV in the downstream primary analysis. The proposed methodology is illustrated by re-examining a famous observational study concerning the effect of right heart catheterization (RHC) in the initial care of critically ill patients.
Mendelian randomization (MR) is a popular instrumental variable (IV) approach, in which one or several genetic markers serve as IVs that can sometimes be leveraged to recover valid inferences about a given exposure-outcome causal association subject to unmeasured confounding. A key IV identification condition known as the exclusion restriction states that the IV cannot have a direct effect on the outcome which is not mediated by the exposure in view. In MR studies, such an assumption requires an unrealistic level of prior knowledge about the mechanism by which genetic markers causally affect the outcome. As a result, possible violation of the exclusion restriction can seldom be ruled out in practice. To address this concern, we introduce a new class of IV estimators which are robust to violation of the exclusion restriction under data generating mechanisms commonly assumed in MR literature. The proposed approach named MR G-Estimation under No Interaction with Unmeasured Selection (MR GENIUS) improves on Robins G-estimation by making it robust to both additive unmeasured confounding and violation of the exclusion restriction assumption. In certain key settings, MR GENIUS reduces to the estimator of Lewbel (2012) which is widely used in econometrics but appears largely unappreciated in MR literature. More generally, MR GENIUS generalizes Lewbels estimator to several key practical MR settings, including multiplicative causal models for binary outcome, multiplicative and odds ratio exposure models, case control study design and censored survival outcomes.
Response adaptive randomization is appealing in confirmatory adaptive clinical trials from statistical, ethical, and pragmatic perspectives, in the sense that subjects are more likely to be randomized to better performing treatment groups based on accumulating data. The Doubly Adaptive Biased Coin Design (DBCD) is a popular solution due to its asymptotic normal property of final allocations, which further justifies its asymptotic type I error rate control. As an alternative, we propose a Response Adaptive Block Randomization (RABR) design with pre-specified randomization ratios for the control and high-performing groups to robustly achieve desired final sample size per group under different underlying responses, which is usually required in industry-sponsored clinical studies. We show that the usual test statistic has a controlled type I error rate. Our simulations further highlight the advantages of the proposed design over the DBCD in terms of consistently achieving final sample allocations and of power performance. We further apply this design to a Phase III study evaluating the efficacy of two dosing regimens of adjunctive everolimus in treating tuberous sclerosis complex but with no previous dose-finding studies in this indication.
Mendelian randomization (MR) has become a popular approach to study causal effects by using genetic variants as instrumental variables. We propose a new MR method, GENIUS-MAWII, which simultaneously addresses the two salient phenomena that adversely affect MR analyses: many weak instruments and widespread horizontal pleiotropy. Similar to MR GENIUS citep{Tchetgen2019_GENIUS}, we achieve identification of the treatment effect by leveraging heteroscedasticity of the exposure. We then derive the class of influence functions of the treatment effect, based on which, we construct a continuous updating estimator and establish its consistency and asymptotic normality under a many weak invalid instruments asymptotic regime by developing novel semiparametric theory. We also provide a measure of weak identification and graphical diagnostic tool. We demonstrate in simulations that GENIUS-MAWII has clear advantages in the presence of directional or correlated horizontal pleiotropy compared to other methods. We apply our method to study the effect of body mass index on systolic blood pressure using UK Biobank.
In this paper, we study the estimation and inference of the quantile treatment effect under covariate-adaptive randomization. We propose two estimation methods: (1) the simple quantile regression and (2) the inverse propensity score weighted quantile regression. For the two estimators, we derive their asymptotic distributions uniformly over a compact set of quantile indexes, and show that, when the treatment assignment rule does not achieve strong balance, the inverse propensity score weighted estimator has a smaller asymptotic variance than the simple quantile regression estimator. For the inference of method (1), we show that the Wald test using a weighted bootstrap standard error under-rejects. But for method (2), its asymptotic size equals the nominal level. We also show that, for both methods, the asymptotic size of the Wald test using a covariate-adaptive bootstrap standard error equals the nominal level. We illustrate the finite sample performance of the new estimation and inference methods using both simulated and real datasets.
Natural and social multivariate systems are commonly studied through sets of simultaneous and time-spaced measurements of the observables that drive their dynamics, i.e., through sets of time series. Typically, this is done via hypothesis testing: the statistical properties of the empirical time series are tested against those expected under a suitable null hypothesis. This is a very challenging task in complex interacting systems, where statistical stability is often poor due to lack of stationarity and ergodicity. Here, we describe an unsupervised, data-driven framework to perform hypothesis testing in such situations. This consists of a statistical mechanical approach - analogous to the configuration model for networked systems - for ensembles of time series designed to preserve, on average, some of the statistical properties observed on an empirical set of time series. We showcase its possible applications with a case study on financial portfolio selection.