ﻻ يوجد ملخص باللغة العربية
Factor models are a class of powerful statistical models that have been widely used to deal with dependent measurements that arise frequently from various applications from genomics and neuroscience to economics and finance. As data are collected at an ever-growing scale, statistical machine learning faces some new challenges: high dimensionality, strong dependence among observed variables, heavy-tailed variables and heterogeneity. High-dimensional robust factor analysis serves as a powerful toolkit to conquer these challenges. This paper gives a selective overview on recent advance on high-dimensional factor models and their applications to statistics including Factor-Adjusted Robust Model selection (FarmSelect) and Factor-Adjusted Robust Multiple testing (FarmTest). We show that classical methods, especially principal component analysis (PCA), can be tailored to many new problems and provide powerful tools for statistical estimation and inference. We highlight PCA and its connections to matrix perturbation theory, robust statistics, random projection, false discovery rate, etc., and illustrate through several applications how insights from these fields yield solutions to modern challenges. We also present far-reaching connections between factor models and popular statistical learning problems, including network analysis and low-rank matrix recovery.
While model selection is a well-studied topic in parametric and nonparametric regression or density estimation, selection of possibly high-dimensional nuisance parameters in semiparametric problems is far less developed. In this paper, we propose a s
In high-dimensions, the prior tails can have a significant effect on both posterior computation and asymptotic concentration rates. To achieve optimal rates while keeping the posterior computations relatively simple, an empirical Bayes approach has r
Though Gaussian graphical models have been widely used in many scientific fields, limited progress has been made to link graph structures to external covariates because of substantial challenges in theory and computation. We propose a Gaussian graphi
Robust real-time monitoring of high-dimensional data streams has many important real-world applications such as industrial quality control, signal detection, biosurveillance, but unfortunately it is highly non-trivial to develop efficient schemes due
Heterogeneity is an important feature of modern data sets and a central task is to extract information from large-scale and heterogeneous data. In this paper, we consider multiple high-dimensional linear models and adopt the definition of maximin eff