Penalized pairwise pseudo likelihood for variable selection with nonignorable missing data

67 0 0.0 ( 0 )

Download Cite

Added by Jiwei Zhao

Publication date 2017

fields Mathematical Statistics

and research's language is English

Authors Jiwei Zhao - Yang Yang - Yang Ning

Methodology

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

The regularization approach for variable selection was well developed for a completely observed data set in the past two decades. In the presence of missing values, this approach needs to be tailored to different missing data mechanisms. In this paper, we focus on a flexible and generally applicable missing data mechanism, which contains both ignorable and nonignorable missing data mechanism assumptions. We show how the regularization approach for variable selection can be adapted to the situation under this missing data mechanism. The computational and theoretical properties for variable selection consistency are established. The proposed method is further illustrated by comprehensive simulation studies and real data analyses, for both low and high dimensional settings.

rate research

Tuning parameter selection for penalized likelihood estimation of inverse covariance matrix

581 - Xin Gao , Daniel Q. Pu , Yuehua Wu 2009

In a Gaussian graphical model, the conditional independence between two variables are characterized by the corresponding zero entries in the inverse covariance matrix. Maximum likelihood method using the smoothly clipped absolute deviation (SCAD) penalty (Fan and Li, 2001) and the adaptive LASSO penalty (Zou, 2006) have been proposed in literature. In this article, we establish the result that using Bayesian information criterion (BIC) to select the tuning parameter in penalized likelihood estimation with both types of penalties can lead to consistent graphical model selection. We compare the empirical performance of BIC with cross validation method and demonstrate the advantageous performance of BIC criterion for tuning parameter selection through simulation studies.

Methodology Machine Learning

Fast selection of nonlinear mixed effect models using penalized likelihood

131 - Edouard Ollier 2021

Nonlinear Mixed effects models are hidden variables models that are widely used in many field such as pharmacometrics. In such models, the distribution characteristics of hidden variables can be specified by including several parameters such as covariates or correlations which must be selected. Recent development of pharmacogenomics has brought averaged/high dimensional problems to the field of nonlinear mixed effects modeling for which standard covariates selection techniques like stepwise methods are not well suited. This work proposes to select covariates and correlation parameters using a penalized likelihood approach. The penalized likelihood problem is solved using a stochastic proximal gradient algorithm to avoid inner-outer iterations. Speed of convergence of the proximal gradient algorithm is improved by the use of component-wise adaptive gradient step sizes. The practical implementation and tuning of the proximal gradient algorithm is explored using simulations. Calibration of regularization parameters is performed by minimizing the Bayesian Information Criterion using particle swarm optimization, a zero order optimization procedure. The use of warm restart and parallelization allows to reduce significantly computing time. The performance of the proposed method compared to the traditional grid search strategy is explored using simulated data. Finally, an application to real data from two pharmacokinetics studies is provided, one studying an antifibrinolitic and the other studying an antibiotic.

Methodology Computation

Improved Non-parametric Penalized Maximum Likelihood Estimation for Arbitrarily Censored Survival Data

81 - Justin D. Tubbs , Lane Guolan Chen , Thuan Quoc Thach 2021

Non-parametric maximum likelihood estimation encompasses a group of classic methods to estimate distribution-associated functions from potentially censored and truncated data, with extensive applications in survival analysis. These methods, including the Kaplan-Meier estimator and Turnbulls method, often result in overfitting, especially when the sample size is small. We propose an improvement to these methods by applying kernel smoothing to their raw estimates, based on a BIC-type loss function that balances the trade-off between optimizing model fit and controlling model complexity. In the context of a longitudinal study with repeated observations, we detail our proposed smoothing procedure and optimization algorithm. With extensive simulation studies over multiple realistic scenarios, we demonstrate that our smoothing-based procedure provides better overall accuracy in both survival function estimation and individual-level time-to-event prediction by reducing overfitting. Our smoothing procedure decreases the discrepancy between the estimated and true simulated survival function using interval-censored data by up to 49% compared to the raw un-smoothed estimate, with similar improvements of up to 41% and 23% in within-sample and out-of-sample prediction, respectively. Finally, we apply our method to real data on censored breast cancer diagnosis, which similarly shows improvement when compared to empirical survival estimates from uncensored data. We provide an R package, SISE, for implementing our penalized likelihood method.

Methodology

Robust penalized empirical likelihood in high dimensional longitudinal data analysis

128 - Jiaqi Li , Liya Fu 2021

As an effective nonparametric method, empirical likelihood (EL) is appealing in combining estimating equations flexibly and adaptively for incorporating data information. To select important variables and estimating equations in the sparse high-dimensional model, we consider a penalized EL method based on robust estimating functions by applying two penalty functions for regularizing the regression parameters and the associated Lagrange multipliers simultaneously, which allows the dimensionalities of both regression parameters and estimating equations to grow exponentially with the sample size. A first inspection on the robustness of estimating equations contributing to the estimating equations selection and variable selection is discussed from both theoretical perspective and intuitive simulation results in this paper. The proposed method can improve the robustness and effectiveness when the data have underlying outliers or heavy tails in the response variables and/or covariates. The robustness of the estimator is measured via the bounded influence function, and the oracle properties are also established under some regularity conditions. Extensive simulation studies and a yeast cell data are used to evaluate the performance of the proposed method. The numerical results reveal that the robustness of sparse estimating equations selection fundamentally enhances variable selection accuracy when the data have heavy tails and/or include underlying outliers.

Methodology

Robust approach for variable selection with high dimensional Logitudinal data analysis

169 - Liya Fu , Jiaqi Li , You-Gan Wang 2020

This paper proposes a new robust smooth-threshold estimating equation to select important variables and automatically estimate parameters for high dimensional longitudinal data. A novel working correlation matrix is proposed to capture correlations within the same subject. The proposed procedure works well when the number of covariates p increases as the number of subjects n increases. The proposed estimates are competitive with the estimates obtained with the true correlation structure, especially when the data are contaminated. Moreover, the proposed method is robust against outliers in the response variables and/or covariates. Furthermore, the oracle properties for robust smooth-threshold estimating equations under large n, diverging p are established under some regularity conditions. Extensive simulation studies and a yeast cell cycle data are used to evaluate the performance of the proposed method, and results show that our proposed method is competitive with existing robust variable selection procedures.

Methodology