ﻻ يوجد ملخص باللغة العربية
We consider the problem of handling missing data with deep latent variable models (DLVMs). First, we present a simple technique to train DLVMs when the training set contains missing-at-random data. Our approach, called MIWAE, is based on the importance-weighted autoencoder (IWAE), and maximises a potentially tight lower bound of the log-likelihood of the observed data. Compared to the original IWAE, our algorithm does not induce any additional computational overhead due to the missing data. We also develop Monte Carlo techniques for single and multiple imputation using a DLVM trained on an incomplete data set. We illustrate our approach by training a convolutional DLVM on a static binarisation of MNIST that contains 50% of missing pixels. Leveraging multiple imputation, a convolutional network trained on these incomplete digits has a test performance similar to one trained on complete data. On various continuous and binary data sets, we also show that MIWAE provides accurate single imputations, and is highly competitive with state-of-the-art methods.
When a missing process depends on the missing values themselves, it needs to be explicitly modelled and taken into account while doing likelihood-based inference. We present an approach for building and fitting deep latent variable models (DLVMs) in
Several statistical models are given in the form of unnormalized densities, and calculation of the normalization constant is intractable. We propose estimation methods for such unnormalized models with missing data. The key concept is to combine impu
There are threefold challenges in emotion recognition. First, it is difficult to recognize humans emotional states only considering a single modality. Second, it is expensive to manually annotate the emotional data. Third, emotional data often suffer
In population synthesis applications, when considering populations with many attributes, a fundamental problem is the estimation of rare combinations of feature attributes. Unsurprisingly, it is notably more difficult to reliably representthe sparser
Missing data is a crucial issue when applying machine learning algorithms to real-world datasets. Starting from the simple assumption that two batches extracted randomly from the same dataset should share the same distribution, we leverage optimal tr