ترغب بنشر مسار تعليمي؟ اضغط هنا

Robust Sparse Bayesian Infinite Factor Models

106   0   0.0 ( 0 )
 نشر من قبل Jaejoon Lee
 تاريخ النشر 2020
  مجال البحث الاحصاء الرياضي
والبحث باللغة English




اسأل ChatGPT حول البحث

Most of previous works and applications of Bayesian factor model have assumed the normal likelihood regardless of its validity. We propose a Bayesian factor model for heavy-tailed high-dimensional data based on multivariate Student-$t$ likelihood to obtain better covariance estimation. We use multiplicative gamma process shrinkage prior and factor number adaptation scheme proposed in Bhattacharya & Dunson [Biometrika (2011) 291-306]. Since a naive Gibbs sampler for the proposed model suffers from slow mixing, we propose a Markov Chain Monte Carlo algorithm where fast mixing of Hamiltonian Monte Carlo is exploited for some parameters in proposed model. Simulation results illustrate the gain in performance of covariance estimation for heavy-tailed high-dimensional data. We also provide a theoretical result that the posterior of the proposed model is weakly consistent under reasonable conditions. We conclude the paper with the application of proposed factor model on breast cancer metastasis prediction given DNA signature data of cancer cell.

قيم البحث

اقرأ أيضاً

This paper investigates the high-dimensional linear regression with highly correlated covariates. In this setup, the traditional sparsity assumption on the regression coefficients often fails to hold, and consequently many model selection procedures do not work. To address this challenge, we model the variations of covariates by a factor structure. Specifically, strong correlations among covariates are explained by common factors and the remaining variations are interpreted as idiosyncratic components of each covariate. This leads to a factor-adjusted regression model with both common factors and idiosyncratic components as covariates. We generalize the traditional sparsity assumption accordingly and assume that all common factors but only a small number of idiosyncratic components contribute to the response. A Bayesian procedure with a spike-and-slab prior is then proposed for parameter estimation and model selection. Simulation studies show that our Bayesian method outperforms its lasso analogue, manifests insensitivity to the overestimates of the number of common factors, pays a negligible price in the no correlation case, and scales up well with increasing sample size, dimensionality and sparsity. Numerical results on a real dataset of U.S. bond risk premia and macroeconomic indicators lend strong support to our methodology.
We propose a novel approach to estimating the precision matrix of multivariate Gaussian data that relies on decomposing them into a low-rank and a diagonal component. Such decompositions are very popular for modeling large covariance matrices as they admit a latent factor based representation that allows easy inference. The same is not true for precision matrices, due to the lack of computationally convenient representation, which restricts the use to low to moderate dimensional problems. We address this remarkable gap in the literature by introducing a novel latent variable representation for such decomposition for precision matrices as well. The construction leads to an efficient Gibbs sampler that scales very well to high-dimensional problems far beyond the limits of the current state-of-the-art. The ability to efficiently explore the full posterior space allows the model uncertainty to be easily assessed. The decomposition also crucially allows us to adapt sparsity inducing priors to shrink the insignificant entries of the precision matrix toward zero, making the approach adaptable to high-dimensional small-sample-size sparse settings. Exact zeros in the matrix encoding the underlying conditional independence graph are then determined via a novel posterior false discovery rate control procedure. We evaluate the methods empirical performance through synthetic experiments and illustrate its practical utility in data sets from two different application domains.
We propose a new, flexible model for inference of the effect of a binary treatment on a continuous outcome observed over subsequent time periods. The model allows to seperate association due to endogeneity of treatment selection from additional longi tudinal association of the outcomes and hence unbiased estimation of dynamic treatment effects. We investigate the performance of the proposed method on simulated data and employ it to reanalyse data on the longitudinal effects of a long maternity leave on mothers earnings after their return to the labour market.
Its conceptual appeal and effectiveness has made latent factor modeling an indispensable tool for multivariate analysis. Despite its popularity across many fields, there are outstanding methodological challenges that have hampered practical deploymen ts. One major challenge is the selection of the number of factors, which is exacerbated for dynamic factor models, where factors can disappear, emerge, and/or reoccur over time. Existing tools that assume a fixed number of factors may provide a misguided representation of the data mechanism, especially when the number of factors is crudely misspecified. Another challenge is the interpretability of the factor structure, which is often regarded as an unattainable objective due to the lack of identifiability. Motivated by a topical macroeconomic application, we develop a flexible Bayesian method for dynamic factor analysis (DFA) that can simultaneously accommodate a time-varying number of factors and enhance interpretability without strict identifiability constraints. To this end, we turn to dynamic sparsity by employing Dynamic Spike-and-Slab (DSS) priors within DFA. Scalable Bayesian EM estimation is proposed for fast posterior mode identification via rotations to sparsity, enabling Bayesian data analysis at scales that would have been previously time-consuming. We study a large-scale balanced panel of macroeconomic variables covering multiple facets of the US economy, with a focus on the Great Recession, to highlight the efficacy and usefulness of our proposed method.
Factor models are a class of powerful statistical models that have been widely used to deal with dependent measurements that arise frequently from various applications from genomics and neuroscience to economics and finance. As data are collected at an ever-growing scale, statistical machine learning faces some new challenges: high dimensionality, strong dependence among observed variables, heavy-tailed variables and heterogeneity. High-dimensional robust factor analysis serves as a powerful toolkit to conquer these challenges. This paper gives a selective overview on recent advance on high-dimensional factor models and their applications to statistics including Factor-Adjusted Robust Model selection (FarmSelect) and Factor-Adjusted Robust Multiple testing (FarmTest). We show that classical methods, especially principal component analysis (PCA), can be tailored to many new problems and provide powerful tools for statistical estimation and inference. We highlight PCA and its connections to matrix perturbation theory, robust statistics, random projection, false discovery rate, etc., and illustrate through several applications how insights from these fields yield solutions to modern challenges. We also present far-reaching connections between factor models and popular statistical learning problems, including network analysis and low-rank matrix recovery.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا