ﻻ يوجد ملخص باللغة العربية
This paper proposes an innovative method for constructing confidence intervals and assessing p-values in statistical inference for high-dimensional linear models. The proposed method has successfully broken the high-dimensional inference problem into a series of low-dimensional inference problems: For each regression coefficient $beta_i$, the confidence interval and $p$-value are computed by regressing on a subset of variables selected according to the conditional independence relations between the corresponding variable $X_i$ and other variables. Since the subset of variables forms a Markov neighborhood of $X_i$ in the Markov network formed by all the variables $X_1,X_2,ldots,X_p$, the proposed method is coined as Markov neighborhood regression. The proposed method is tested on high-dimensional linear, logistic and Cox regression. The numerical results indicate that the proposed method significantly outperforms the existing ones. Based on the Markov neighborhood regression, a method of learning causal structures for high-dimensional linear models is proposed and applied to identification of drug sensitive genes and cancer driver genes. The idea of using conditional independence relations for dimension reduction is general and potentially can be extended to other high-dimensional or big data problems as well.
Among the most popular variable selection procedures in high-dimensional regression, Lasso provides a solution path to rank the variables and determines a cut-off position on the path to select variables and estimate coefficients. In this paper, we c
With the availability of high dimensional genetic biomarkers, it is of interest to identify heterogeneous effects of these predictors on patients survival, along with proper statistical inference. Censored quantile regression has emerged as a powerfu
Labeling patients in electronic health records with respect to their statuses of having a disease or condition, i.e. case or control statuses, has increasingly relied on prediction models using high-dimensional variables derived from structured and u
There are many scenarios such as the electronic health records where the outcome is much more difficult to collect than the covariates. In this paper, we consider the linear regression problem with such a data structure under the high dimensionality.
Heterogeneity is an important feature of modern data sets and a central task is to extract information from large-scale and heterogeneous data. In this paper, we consider multiple high-dimensional linear models and adopt the definition of maximin eff