Multiple imputation of covariates by fully conditional specification: accommodating the substantive model

296 0 0.0 ( 0 )

Download Cite

Added by Jonathan Bartlett

Publication date 2012

fields Mathematical Statistics

and research's language is English

Authors Jonathan W. Bartlett - Shaun R. Seaman - Ian R. White

Methodology

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Missing covariate data commonly occur in epidemiological and clinical research, and are often dealt with using multiple imputation (MI). Imputation of partially observed covariates is complicated if the substantive model is non-linear (e.g. Cox proportional hazards model), or contains non-linear (e.g. squared) or interaction terms, and standard software implementations of MI may impute covariates from models that are incompatible with such substantive models. We show how imputation by fully conditional specification, a popular approach for performing MI, can be modified so that covariates are imputed from models which are compatible with the substantive model. We investigate through simulation the performance of this proposal, and compare it to existing approaches. Simulation results suggest our proposal gives consistent estimates for a range of common substantive models, including models which contain non-linear covariate effects or interactions, provided data are missing at random and the assumed imputation models are correctly specified and mutually compatible.

rate research

Projective Resampling Imputation Mean Estimation Method for Missing Covariates Problem

104 - Zishu Zhan , Xiangjie Li , Jingxiao Zhang 2021

Missing data is a common problem in clinical data collection, which causes difficulty in the statistical analysis of such data. To overcome problems caused by incomplete data, we propose a new imputation method called projective resampling imputation mean estimation (PRIME), which can also address ``the curse of dimensionality problem in imputation with less information loss. We use various sample sizes, missing-data rates, covariate correlations, and noise levels in simulation studies, and all results show that PRIME outperformes other methods such as iterative least-squares estimation (ILSE), maximum likelihood (ML), and complete-case analysis (CC). Moreover, we conduct a study of influential factors in cardiac surgery-associated acute kidney injury (CSA-AKI), which show that our method performs better than the other models. Finally, we prove that PRIME has a consistent property under some regular conditions.

Methodology

Bootstrap Inference for Multiple Imputation under Uncongeniality and Misspecification

50 - Jonathan W. Bartlett , Rachael A. Hughes 2019

Multiple imputation has become one of the most popular approaches for handling missing data in statistical analyses. Part of this success is due to Rubins simple combination rules. These give frequentist valid inferences when the imputation and analysis procedures are so called congenial and the complete data analysis is valid, but otherwise may not. Roughly speaking, congeniality corresponds to whether the imputation and analysis models make different assumptions about the data. In practice imputation and analysis procedures are often not congenial, such that tests may not have the correct size and confidence interval coverage deviates from the advertised level. We examine a number of recent proposals which combine bootstrapping with multiple imputation, and determine which are valid under uncongeniality and model misspecification. Imputation followed by bootstrapping generally does not result in valid variance estimates under uncongeniality or misspecification, whereas bootstrapping followed by imputation does. We recommend a particular computationally efficient variant of bootstrapping followed by imputation.

Methodology

A frailty-contagion model for multi-site hourly precipitation driven by atmospheric covariates

221 - Erwan Koch , Philippe Naveau 2014

Accurate stochastic simulations of hourly precipitation are needed for impact studies at local spatial scales. Statistically, hourly precipitation data represent a difficult challenge. They are non-negative, skewed, heavy tailed, contain a lot of zeros (dry hours) and they have complex temporal structures (e.g., long persistence of dry episodes). Inspired by frailty-contagion approaches used in finance and insurance, we propose a multi-site precipitation simulator that, given appropriate regional atmospheric variables, can simultaneously handle dry events and heavy rainfall periods. One advantage of our model is its conceptual simplicity in its dynamical structure. In particular, the temporal variability is represented by a common factor based on a few classical atmospheric covariates like temperatures, pressures and others. Our inference approach is tested on simulated data and applied on measurements made in the northern part of French Brittany.

Methodology Applications

SMIM: a unified framework of Survival sensitivity analysis using Multiple Imputation and Martingale

200 - Shu Yang , Yilong Zhang , Guanghan Frank Liu 2020

Censored survival data are common in clinical trial studies. We propose a unified framework for sensitivity analysis to censoring at random in survival data using multiple imputation and martingale, called SMIM. The proposed framework adopts the delta-adjusted and control-based models, indexed by the sensitivity parameter, entailing censoring at random and a wide collection of censoring not at random assumptions. Also, it targets for a broad class of treatment effect estimands defined as functionals of treatment-specific survival functions, taking into account of missing data due to censoring. Multiple imputation facilitates the use of simple full-sample estimation; however, the standard Rubins combining rule may overestimate the variance for inference in the sensitivity analysis framework. We decompose the multiple imputation estimator into a martingale series based on the sequential construction of the estimator and propose the wild bootstrap inference by resampling the martingale series. The new bootstrap inference has a theoretical guarantee for consistency and is computationally efficient compared to the non-parametric bootstrap counterpart. We evaluate the finite-sample performance of the proposed SMIM through simulation and an application on a HIV clinical trial.

Methodology

Low-rank model with covariates for count data analysis

74 - Genevi`eve Robin , Julie Josse (CMAP 2017

Count data are collected in many scientific and engineering tasks including image processing, single-cell RNA sequencing and ecological studies. Such data sets often contain missing values, for example because some ecological sites cannot be reached in a certain year. In addition, in many instances, side information is also available, for example covariates about ecological sites or species. Low-rank methods are popular to denoise and impute count data, and benefit from a substantial theoretical background. Extensions accounting for covariates have been proposed, but to the best of our knowledge their theoretical and empirical properties have not been thoroughly studied, and few softwares are available for practitioners. We propose a complete methodology called LORI (Low-Rank Interaction), including a Poisson model, an algorithm, and automatic selection of the regularization parameter, to analyze count tables with covariates. We also derive an upper bound on the estimation error. We provide a simulation study with synthetic data, revealing empirically that LORI improves on state of the art methods in terms of estimation and imputation of the missing values. We illustrate how the method can be interpreted through visual displays with the analysis of a well-know plant abundance data set, and show that the LORI outputs are consistent with known results. Finally we demonstrate the relevance of the methodology by analyzing a water-birds abundance table from the French national agency for wildlife and hunting management (ONCFS). The method is available in the R package lori on the Comprehensive Archive Network (CRAN).

Methodology