Should a Normal Imputation Model Be Modified to Impute Skewed Variables?

97 0 0.0 ( 0 )

Download Cite

Added by Paul von Hippel

Publication date 2017

fields Mathematical Statistics

and research's language is English

Authors Paul T. von Hippel

Methodology

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Researchers often impute continuous variables under an assumption of normality, yet many incomplete variables are skewed. We find that imputing skewed continuous variables under a normal model can lead to bias; the bias is usually mild for popular estimands such as means, standard deviations, and linear regression coefficients, but the bias can be severe for more shape-dependent estimands such as percentiles or the coefficient of skewness. We test several methods for adapting a normal imputation model to accommodate skewness, including methods that transform, truncate, or censor (round) normally imputed values, as well as methods that impute values from a quadratic or truncated regression. None of these modifications reliably reduces the biases of the normal model, and some modifications can make the biases much worse. We conclude that, if one has to impute a skewed variable under a normal model, it is usually safest to do so without modifications -- unless you are more interested in estimating percentiles and shape that in estimated means, variance, and regressions. In the conclusion, we briefly discuss promising developments in the area of continuous imputation models that do not assume normality.

rate research

Propensity score analysis with partially observed confounders: how should multiple imputation be used?

102 - Clemence Leyrat , Shaun R. Seaman , Ian R. White 2016

Inverse probability of treatment weighting (IPTW) is a popular propensity score (PS)-based approach to estimate causal effects in observational studies at risk of confounding bias. A major issue when estimating the PS is the presence of partially observed covariates. Multiple imputation (MI) is a natural approach to handle missing data on covariates, but its use in the PS context raises three important questions: (i) should we apply Rubins rules to the IPTW treatment effect estimates or to the PS estimates themselves? (ii) does the outcome have to be included in the imputation model? (iii) how should we estimate the variance of the IPTW estimator after MI? We performed a simulation study focusing on the effect of a binary treatment on a binary outcome with three confounders (two of them partially observed). We used MI with chained equations to create complete datasets and compared three ways of combining the results: combining treatment effect estimates (MIte); combining the PS across the imputed datasets (MIps); or combining the PS parameters and estimating the PS of the average covariates across the imputed datasets (MIpar). We also compared the performance of these methods to complete case (CC) analysis and the missingness pattern (MP) approach, a method which uses a different PS model for each pattern of missingness. We also studied empirically the consistency of these 3 MI estimators. Under a missing at random (MAR) mechanism, CC and MP analyses were biased in most cases when estimating the marginal treatment effect, whereas MI approaches had good performance in reducing bias as long as the outcome was included in the imputation model. However, only MIte was unbiased in all the studied scenarios and Rubins rules provided good variance estimates for MIte.

Methodology

Categorical data analysis using a skewed Weibull regression model

60 - Renault Caron , Debajyoti Sinha , Dipak Dey 2017

In this paper, we present a Weibull link (skewed) model for categorical response data arising from binomial as well as multinomial model. We show that, for such types of categorical data, the most commonly used models (logit, probit and complementary log-log) can be obtained as limiting cases. We further compare the proposed model with some other asymmetrical models. The Bayesian as well as frequentist estimation procedures for binomial and multinomial data responses are presented in details. The analysis of two data sets to show the efficiency of the proposed model is performed.

Methodology

Should Observations be Grouped for Effective Monitoring of Multivariate Process Variability?

58 - Jimoh Olawale Ajadi , Inez Maria Zwetsloot 2019

A multivariate dispersion control chart monitors changes in the process variability of multiple correlated quality characteristics. In this article, we investigate and compare the performance of charts designed to monitor variability based on individual and grouped multivariate observations. We compare one of the most well-known methods for monitoring individual observations -- a multivariate EWMA chart proposed by Huwang et al -- to various charts based on grouped observations. In addition, we compare charts based on monitoring with overlapping and nonoverlapping subgroups. We recommend using charts based on overlapping subgroups when monitoring with subgroup data. The effect of subgroup size is also investigated. Steady-state average time to signal is used as performance measure. We show that monitoring methods based on individual observations are the quickest in detecting sustained shifts in the process variability. We use a simulation study to obtain our results and illustrated these with a case study.

Methodology

Should Robots be Obedient?

225 - Smitha Milli , Dylan Hadfield-Menell , Anca Dragan 2017

Intuitively, obedience -- following the order that a human gives -- seems like a good property for a robot to have. But, we humans are not perfect and we may give orders that are not best aligned to our preferences. We show that when a human is not perfectly rational then a robot that tries to infer and act according to the humans underlying preferences can always perform better than a robot that simply follows the humans literal order. Thus, there is a tradeoff between the obedience of a robot and the value it can attain for its owner. We investigate how this tradeoff is impacted by the way the robot infers the humans preferences, showing that some methods err more on the side of obedience than others. We then analyze how performance degrades when the robot has a misspecified model of the features that the human cares about or the level of rationality of the human. Finally, we study how robots can start detecting such model misspecification. Overall, our work suggests that there might be a middle ground in which robots intelligently decide when to obey human orders, but err on the side of obedience.

Artificial Intelligence

Four Skewed Tensor Distributions

104 - Michael P.B. Gallaugher , Peter A. Tait , 2021

With the rise of the big data phenomenon in recent years, data is coming in many different complex forms. One example of this is multi-way data that come in the form of higher-order tensors such as coloured images and movie clips. Although there has been a recent rise in models for looking at the simple case of three-way data in the form of matrices, there is a relative paucity of higher-order tensor variate methods. The most common tensor distribution in the literature is the tensor variate normal distribution; however, its use can be problematic if the data exhibit skewness or outliers. Herein, we develop four skewed tensor variate distributions which to our knowledge are the first skewed tensor distributions to be proposed in the literature, and are able to parameterize both skewness and tail weight. Properties and parameter estimation are discussed, and real and simulated data are used for illustration.

Methodology