ﻻ يوجد ملخص باللغة العربية
Multiple imputation (MI) is the state-of-the-art approach for dealing with missing data arising from non-response in sample surveys. Multiple imputation by chained equations (MICE) is the most widely used MI method, but it lacks theoretical foundation and is computationally intensive. Recently, MI methods based on deep learning models have been developed with encouraging results in small studies. However, there has been limited research on systematically evaluating their performance in realistic settings comparing to MICE, particularly in large-scale surveys. This paper provides a general framework for using simulations based on real survey data and several performance metrics to compare MI methods. We conduct extensive simulation studies based on the American Community Survey data to compare repeated sampling properties of four machine learning based MI methods: MICE with classification trees, MICE with random forests, generative adversarial imputation network, and multiple imputation using denoising autoencoders. We find the deep learning based MI methods dominate MICE in terms of computational time; however, MICE with classification trees consistently outperforms the deep learning MI methods in terms of bias, mean squared error, and coverage under a range of realistic settings.
Missing data imputation can help improve the performance of prediction models in situations where missing data hide useful information. This paper compares methods for imputing missing categorical data for supervised classification tasks. We experime
Several statistical models are given in the form of unnormalized densities, and calculation of the normalization constant is intractable. We propose estimation methods for such unnormalized models with missing data. The key concept is to combine impu
We consider the topic of data imputation, a foundational task in machine learning that addresses issues with missing data. To that end, we propose MCFlow, a deep framework for imputation that leverages normalizing flow generative models and Monte Car
The COVID-19 pandemic represents the most significant public health disaster since the 1918 influenza pandemic. During pandemics such as COVID-19, timely and reliable spatio-temporal forecasting of epidemic dynamics is crucial. Deep learning-based ti
Huge neural network models have shown unprecedented performance in real-world applications. However, due to memory constraints, model parallelism must be utilized to host large models that would otherwise not fit into the memory of a single device. P