A note on marginal correlation based screening

54 0 0.0 ( 0 )

Download Cite

Added by Vivekananda Roy

Publication date 2017

fields Mathematical Statistics

and research's language is English

Authors Run Wang - Somak Dutta - Vivekananda Roy

Methodology

visit our facebook page

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Independence screening methods such as the two sample $t$-test and the marginal correlation based ranking are among the most widely used techniques for variable selection in ultrahigh dimensional data sets. In this short note, simple examples are used to demonstrate potential problems with the independence screening methods in the presence of correlated predictors. Also, an example is considered where all important variables are independent among themselves and all but one important variables are independent with the unimportant variables. Furthermore, a real data example from a genome wide association study is used to illustrate inferior performance of marginal correlation screening compared to another screening method.

rate research

Distance-Based Independence Screening for Canonical Analysis

76 - Chuanping Yu , Xiaoming Huo 2019

This paper introduces a new method named Distance-based Independence Screening for Canonical Analysis (DISCA) to reduce dimensions of two random vectors with arbitrary dimensions. The objective of our method is to identify the low dimensional linear projections of two random vectors, such that any dimension reduction based on linear projection with lower dimensions will surely affect some dependent structure -- the removed components are not independent. The essence of DISCA is to use the distance correlation to eliminate the redundant dimensions until infeasible. Unlike the existing canonical analysis methods, DISCA does not require the dimensions of the reduced subspaces of the two random vectors to be equal, nor does it require certain distributional assumption on the random vectors. We show that under mild conditions, our approach does undercover the lowest possible linear dependency structures between two random vectors, and our conditions are weaker than some sufficient linear subspace-based methods. Numerically, DISCA is to solve a non-convex optimization problem. We formulate it as a difference-of-convex (DC) optimization problem, and then further adopt the alternating direction method of multipliers (ADMM) on the convex step of the DC algorithms to parallelize/accelerate the computation. Some sufficient linear subspace-based methods use potentially numerically-intensive bootstrap method to determine the dimensions of the reduced subspaces in advance; our method avoids this complexity. In simulations, we present cases that DISCA can solve effectively, while other methods cannot. In both the simulation studies and real data cases, when the other state-of-the-art dimension reduction methods are applicable, we observe that DISCA performs either comparably or better than most of them. Codes and an R package can be found in GitHub https://github.com/ChuanpingYu/DISCA.

Methodology

A note on infinite extreme correlation matrices

129 - J. Kiukas , J.-P. Pellonpaa 2006

We give a characterization for the extreme points of the convex set of correlation matrices with a countable index set. A Hermitian matrix is called a correlation matrix if it is positive semidefinite with unit diagonal entries. Using the characterization we show that there exist extreme points of any rank.

General Mathematics Mathematical Physics Functional Analysis

A note on Influence diagnostics in nonlinear mixed-effects elliptical models

156 - Alexandre G. Patriota 2009

This paper provides general matrix formulas for computing the score function, the (expected and observed) Fisher information and the $Delta$ matrices (required for the assessment of local influence) for a quite general model which includes the one proposed by Russo et al. (2009). Additionally, we also present an expression for the generalized leverage. The matrix formulation has a considerable advantage, since although the complexity of the postulated model, all general formulas are compact, clear and have nice forms.

Methodology

Model Based Screening Embedded Bayesian Variable Selection for Ultra-high Dimensional Settings

182 - Dongjin Li , Somak Dutta , Vivekananda Roy 2020

We develop a Bayesian variable selection method, called SVEN, based on a hierarchical Gaussian linear model with priors placed on the regression coefficients as well as on the model space. Sparsity is achieved by using degenerate spike priors on inactive variables, whereas Gaussian slab priors are placed on the coefficients for the important predictors making the posterior probability of a model available in explicit form (up to a normalizing constant). The strong model selection consistency is shown to be attained when the number of predictors grows nearly exponentially with the sample size and even when the norm of mean effects solely due to the unimportant variables diverge, which is a novel attractive feature. An appealing byproduct of SVEN is the construction of novel model weight adjusted prediction intervals. Embedding a unique model based screening and using fast Cholesky updates, SVEN produces a highly scalable computational framework to explore gigantic model spaces, rapidly identify the regions of high posterior probabilities and make fast inference and prediction. A temperature schedule guided by our model selection consistency derivations is used to further mitigate multimodal posterior distributions. The performance of SVEN is demonstrated through a number of simulation experiments and a real data example from a genome wide association study with over half a million markers.

Methodology Computation

A Note on Neutron Capture Correlation Signals, Backgrounds, and Efficiencies

165 - N. S. Bowden , M. Sweany , S. Dazeley 2012

A wide variety of detection applications exploit the timing correlations that result from the slowing and eventual capture of neutrons. These include capture-gated neutron spectrometry, multiple neutron counting for fissile material detection and identification, and antineutrino detection. There are several distinct processes that result in correlated signals in these applications. Depending on the application, one class of correlated events can be a background that is difficult to distinguish from the class that is of interest. Furthermore, the correlation timing distribution depends on the neutron capture agent and detector geometry. Here, we explain the important characteristics of the neutron capture timing distribution, making reference to simulations and data from a number of detectors currently in use or under development. We point out several features that may assist in background discrimination, and that must be carefully accounted for if accurate detection efficiencies are to be quoted.

Instrumentation and Detectors Nuclear Experiment