Sparse group variable selection for gene-environment interactions in the longitudinal study


Abstract in English

Penalized variable selection for high dimensional longitudinal data has received much attention as accounting for the correlation among repeated measurements and providing additional and essential information for improved identification and prediction performance. Despite the success, in longitudinal studies the potential of penalization methods is far from fully understood for accommodating structured sparsity. In this article, we develop a sparse group penalization method to conduct the bi-level gene-environment (G$times$E) interaction study under the repeatedly measured phenotype. Within the quadratic inference function (QIF) framework, the proposed method can achieve simultaneous identification of main and interaction effects on both the group and individual level. Simulation studies have shown that the proposed method outperforms major competitors. In the case study of asthma data from the Childhood Asthma Management Program (CAMP), we conduct G$times$E study by using high dimensional SNP data as the Genetic factor and the longitudinal trait, forced expiratory volume in one second (FEV1), as phenotype. Our method leads to improved prediction and identification of main and interaction effects with important implications.

Download