Do you want to publish a course? Click here

Noncrossing structured additive multiple-output Bayesian quantile regression models

102   0   0.0 ( 0 )
 Added by Bruno Santos
 Publication date 2019
and research's language is English




Ask ChatGPT about the research

Quantile regression models are a powerful tool for studying different points of the conditional distribution of univariate response variables. Their multivariate counterpart extension though is not straightforward, starting with the definition of multivariate quantiles. We propose here a flexible Bayesian quantile regression model when the response variable is multivariate, where we are able to define a structured additive framework for all predictor variables. We build on previous ideas considering a directional approach to define the quantiles of a response variable with multiple-outputs and we define noncrossing quantiles in every directional quantile model. We define a Markov Chain Monte Carlo (MCMC) procedure for model estimation, where the noncrossing property is obtained considering a Gaussian process design to model the correlation between several quantile regression models. We illustrate the results of these models using two data sets: one on dimensions of inequality in the population, such as income and health; the second on scores of students in the Brazilian High School National Exam, considering three dimensions for the response variable.



rate research

Read More

We propose a novel spike and slab prior specification with scaled beta prime marginals for the importance parameters of regression coefficients to allow for general effect selection within the class of structured additive distributional regression. This enables us to model effects on all distributional parameters for arbitrary parametric distributions, and to consider various effect types such as non-linear or spatial effects as well as hierarchical regression structures. Our spike and slab prior relies on a parameter expansion that separates blocks of regression coefficients into overall scalar importance parameters and vectors of standardised coefficients. Hence, we can work with a scalar quantity for effect selection instead of a possibly high-dimensional effect vector, which yields improved shrinkage and sampling performance compared to the classical normal-inverse-gamma prior. We investigate the propriety of the posterior, show that the prior yields desirable shrinkage properties, propose a way of eliciting prior parameters and provide efficient Markov Chain Monte Carlo sampling. Using both simulated and three large-scale data sets, we show that our approach is applicable for data with a potentially large number of covariates, multilevel predictors accounting for hierarchically nested data and non-standard response distributions, such as bivariate normal or zero-inflated Poisson.
97 - Nadja Klein , Jorge Mateu 2021
Statistical techniques used in air pollution modelling usually lack the possibility to understand which predictors affect air pollution in which functional form; and are not able to regress on exceedances over certain thresholds imposed by authorities directly. The latter naturally induce conditional quantiles and reflect the seriousness of particular events. In the present paper we focus on this important aspect by developing quantile regression models further. We propose a general Bayesian effect selection approach for additive quantile regression within a highly interpretable framework. We place separate normal beta prime spike and slab priors on the scalar importance parameters of effect parts and implement a fast Gibbs sampling scheme. Specifically, it enables to study quantile-specific covariate effects, allows these covariates to be of general functional form using additive predictors, and facilitates the analysts decision whether an effect should be included linearly, non-linearly or not at all in the quantiles of interest. In a detailed analysis on air pollution data in Madrid (Spain) we find the added value of modelling extreme nitrogen dioxide (NO2) concentrations and how thresholds are driven differently by several climatological variables and traffic as a spatial proxy. Our results underpin the need of enhanced statistical models to support short-term decisions and enable local authorities to mitigate or even prevent exceedances of NO2 concentration limits.
We develop a Bayesian sum-of-trees model where each tree is constrained by a regularization prior to be a weak learner, and fitting and inference are accomplished via an iterative Bayesian backfitting MCMC algorithm that generates samples from a posterior. Effectively, BART is a nonparametric Bayesian regression approach which uses dimensionally adaptive random basis elements. Motivated by ensemble methods in general, and boosting algorithms in particular, BART is defined by a statistical model: a prior and a likelihood. This approach enables full posterior inference including point and interval estimates of the unknown regression function as well as the marginal effects of potential predictors. By keeping track of predictor inclusion frequencies, BART can also be used for model-free variable selection. BARTs many features are illustrated with a bake-off against competing methods on 42 different data sets, with a simulation experiment and on a drug discovery classification problem.
Quantile regression is studied in combination with a penalty which promotes structured (or group) sparsity. A mixed $ell_{1,infty}$-norm on the parameter vector is used to impose structured sparsity on the traditional quantile regression problem. An algorithm is derived to calculate the piece-wise linear solution path of the corresponding minimization problem. A Matlab implementation of the proposed algorithm is provided and some applications of the methods are also studied.
Graphical models are ubiquitous tools to describe the interdependence between variables measured simultaneously such as large-scale gene or protein expression data. Gaussian graphical models (GGMs) are well-established tools for probabilistic exploration of dependence structures using precision matrices and they are generated under a multivariate normal joint distribution. However, they suffer from several shortcomings since they are based on Gaussian distribution assumptions. In this article, we propose a Bayesian quantile based approach for sparse estimation of graphs. We demonstrate that the resulting graph estimation is robust to outliers and applicable under general distributional assumptions. Furthermore, we develop efficient variational Bayes approximations to scale the methods for large data sets. Our methods are applied to a novel cancer proteomics data dataset wherein multiple proteomic antibodies are simultaneously assessed on tumor samples using reverse-phase protein arrays (RPPA) technology.
comments
Fetching comments Fetching comments
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا