On the global identifiability of logistic regression models with misclassified outcomes

93 0 0.0 ( 0 )

Download Cite

Added by Rui Duan

Publication date 2021

fields Mathematical Statistics

and research's language is English

Authors Rui Duan - Yang Ning - Jiasheng Shi

Statistics Theory Statistics Theory

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

In the last decade, the secondary use of large data from health systems, such as electronic health records, has demonstrated great promise in advancing biomedical discoveries and improving clinical decision making. However, there is an increasing concern about biases in association studies caused by misclassification in the binary outcomes derived from electronic health records. We revisit the classical logistic regression model with misclassified outcomes. Despite that local identification conditions in some related settings have been previously established, the global identification of such models remains largely unknown and is an important question yet to be answered. We derive necessary and sufficient conditions for global identifiability of logistic regression models with misclassified outcomes, using a novel approach termed as the submodel analysis, and a technique adapted from the Picard-Lindel{o}f existence theorem in ordinary differential equations. In particular, our results are applicable to logistic models with discrete covariates, which is a common situation in biomedical studies, The conditions are easy to verify in practice. In addition to model identifiability, we propose a hypothesis testing procedure for regression coefficients in the misclassified logistic regression model when the model is not identifiable under the null.

rate research

A note on MLE in logistic regression with a diverging dimension

284 - Huiming Zhang 2018

This short note is to point the reader to notice that the proof of high dimensional asymptotic normality of MLE estimator for logistic regression under the regime $p_n=o(n)$ given in paper: Maximum likelihood estimation in logistic regression models with a diverging number of covariates. Electronic Journal of Statistics, 6, 1838-1846. is wrong.

Statistics Theory Statistics Theory

Identifiability of directed Gaussian graphical models with one latent source

635 - Dennis Leung , Mathias Drton , Hisayuki Hara 2015

We study parameter identifiability of directed Gaussian graphical models with one latent variable. In the scenario we consider, the latent variable is a confounder that forms a source node of the graph and is a parent to all other nodes, which correspond to the observed variables. We give a graphical condition that is sufficient for the Jacobian matrix of the parametrization map to be full rank, which entails that the parametrization is generically finite-to-one, a fact that is sometimes also referred to as local identifiability. We also derive a graphical condition that is necessary for such identifiability. Finally, we give a condition under which generic parameter identifiability can be determined from identifiability of a model associated with a subgraph. The power of these criteria is assessed via an exhaustive algebraic computational study on models with 4, 5, and 6 observable variables.

Statistics Theory Statistics Theory

Identifiability of parameters in latent structure models with many observed variables

524 - Elizabeth S. Allman , Catherine Matias , John A. Rhodes 2009

While hidden class models of various types arise in many statistical applications, it is often difficult to establish the identifiability of their parameters. Focusing on models in which there is some structure of independence of some of the observed variables conditioned on hidden ones, we demonstrate a general approach for establishing identifiability utilizing algebraic arguments. A theorem of J. Kruskal for a simple latent-class model with finite state space lies at the core of our results, though we apply it to a diverse set of models. These include mixtures of both finite and nonparametric product distributions, hidden Markov models and random graph mixture models, and lead to a number of new results and improvements to old ones. In the parametric setting, this approach indicates that for such models, the classical definition of identifiability is typically too strong. Instead generic identifiability holds, which implies that the set of nonidentifiable parameters has measure zero, so that parameter inference is still meaningful. In particular, this sheds light on the properties of finite mixtures of Bernoulli products, which have been used for decades despite being known to have nonidentifiable parameters. In the nonparametric setting, we again obtain identifiability only when certain restrictions are placed on the distributions that are mixed, but we explicitly describe the conditions.

Statistics Theory Statistics Theory

Preliminary testing derivatives of a linear unified estimator in the logistic regression model

81 - Yasin Asar , Bahad{i}r Yuzbac{s}{i} , Mohammad Arashi 2017

Recently, the well known Liu estimator (Liu, 1993) is attracted researchers attention in regression parameter estimation for an ill conditioned linear model. It is also argued that imposing sub-space hypothesis restriction on parameters improves estimation by shrinking toward non-sample information. Chang (2015) proposed the almost unbiased Liu estimator (AULE) in the binary logistic regression. In this article, some improved unbiased Liu type estimators, namely, restricted AULE, preliminary test AULE, Stein-type shrinkage AULE and its positive part for estimating the regression parameters in the binary logistic regression model are proposed based on the work Chang (2015). The performances of the newly defined estimators are analysed through some numerical results. A real data example is also provided to support the findings.

Statistics Theory Statistics Theory

Geometric ergodicity of Polya-Gamma Gibbs sampler for Bayesian logistic regression with a flat prior

93 - Xin Wang , Vivekananda Roy 2018

The logistic regression model is the most popular model for analyzing binary data. In the absence of any prior information, an improper flat prior is often used for the regression coefficients in Bayesian logistic regression models. The resulting intractable posterior density can be explored by running Polson et al.s (2013) data augmentation (DA) algorithm. In this paper, we establish that the Markov chain underlying Polson et al.s (2013) DA algorithm is geometrically ergodic. Proving this theoretical result is practically important as it ensures the existence of central limit theorems (CLTs) for sample averages under a finite second moment condition. The CLT in turn allows users of the DA algorithm to calculate standard errors for posterior estimates.

Statistics Theory Statistics Theory