Missing at Random or Not: A Semiparametric Testing Approach


Abstract in English

Practical problems with missing data are common, and statistical methods have been developed concerning the validity and/or efficiency of statistical procedures. On a central focus, there have been longstanding interests on the mechanism governing data missingness, and correctly deciding the appropriate mechanism is crucially relevant for conducting proper practical investigations. The conventional notions include the three common potential classes -- missing completely at random, missing at random, and missing not at random. In this paper, we present a new hypothesis testing approach for deciding between missing at random and missing not at random. Since the potential alternatives of missing at random are broad, we focus our investigation on a general class of models with instrumental variables for data missing not at random. Our setting is broadly applicable, thanks to that the model concerning the missing data is nonparametric, requiring no explicit model specification for the data missingness. The foundational idea is to develop appropriate discrepancy measures between estimators whose properties significantly differ only when missing at random does not hold. We show that our new hypothesis testing approach achieves an objective data oriented choice between missing at random or not. We demonstrate the feasibility, validity, and efficacy of the new test by theoretical analysis, simulation studies, and a real data analysis.

Download