A note on marginal correlation based screening


Abstract in English

Independence screening methods such as the two sample $t$-test and the marginal correlation based ranking are among the most widely used techniques for variable selection in ultrahigh dimensional data sets. In this short note, simple examples are used to demonstrate potential problems with the independence screening methods in the presence of correlated predictors. Also, an example is considered where all important variables are independent among themselves and all but one important variables are independent with the unimportant variables. Furthermore, a real data example from a genome wide association study is used to illustrate inferior performance of marginal correlation screening compared to another screening method.

Download