No Arabic abstract
Beta regression models provide an adequate approach for modeling continuous outcomes limited to the interval (0,1). This paper deals with an extension of beta regression models that allow for explanatory variables to be measured with error. The structural approach, in which the covariates measured with error are assumed to be random variables, is employed. Three estimation methods are presented, namely maximum likelihood, maximum pseudo-likelihood and regression calibration. Monte Carlo simulations are used to evaluate the performance of the proposed estimators and the naive estimator. Also, a residual analysis for beta regression models with measurement errors is proposed. The results are illustrated in a real data set.
We study high-dimensional Bayesian linear regression with a general beta prime distribution for the scale parameter. Under the assumption of sparsity, we show that appropriate selection of the hyperparameters in the beta prime prior leads to the (near) minimax posterior contraction rate when $p gg n$. For finite samples, we propose a data-adaptive method for estimating the hyperparameters based on marginal maximum likelihood (MML). This enables our prior to adapt to both sparse and dense settings, and under our proposed empirical Bayes procedure, the MML estimates are never at risk of collapsing to zero. We derive efficient Monte Carlo EM and variational EM algorithms for implementing our model, which are available in the R package NormalBetaPrime. Simulations and analysis of a gene expression data set illustrate our models self-adaptivity to varying levels of sparsity and signal strengths.
This paper develops a bias correction scheme for a multivariate heteroskedastic errors-in-variables model. The applicability of this model is justified in areas such as astrophysics, epidemiology and analytical chemistry, where the variables are subject to measurement errors and the variances vary with the observations. We conduct Monte Carlo simulations to investigate the performance of the corrected estimators. The numerical results show that the bias correction scheme yields nearly unbiased estimates. We also give an application to a real data set.
Quantile regression, the prediction of conditional quantiles, finds applications in various fields. Often, some or all of the variables are discrete. The authors propose two new quantile regression approaches to handle such mixed discrete-continuous data. Both of them generalize the continuous D-vine quantile regression, where the dependence between the response and the covariates is modeled by a parametric D-vine. D-vine quantile regression provides very flexible models, that enable accurate and fast predictions. Moreover, it automatically takes care of major issues of classical quantile regression, such as quantile crossing and interactions between the covariates. The first approach keeps the parametric estimation of the D-vines, but modifies the formulas to account for the discreteness. The second approach estimates the D-vine using continuous convolution to make the discrete variables continuous and then estimates the D-vine nonparametrically. A simulation study is presented examining for which scenarios the discrete-continuous D-vine quantile regression can provide superior prediction abilities. Lastly, the functionality of the two introduced methods is demonstrated by a real-world example predicting the number of bike rentals.
In actuarial practice the dependency between contract limitations (deductibles, copayments) and health care expenditures are measured by the application of the Monte Carlo simulation technique. We propose, for the same goal, an alternative approach based on Generalized Linear Model for Location, Scale and Shape (GAMLSS). We focus on the estimate of the ratio between the one-year reimbursement amount (after the effect of limitations) and the one year expenditure (before the effect of limitations). We suggest a regressive model to investigate the relation between this response variable and a set of covariates, such as limitations and other rating factors related to health risk. In this way a dependency structure between reimbursement and limitations is provided. The density function of the ratio is a mixture distribution, indeed it can continuously assume values mass at 0 and 1, in addition to the probability density within (0, 1) . This random variable does not belong to the exponential family, then an ordinary Generalized Linear Model is not suitable. GAMLSS introduces a probability structure compliant with the density of the response variable, in particular zero-one inflated beta density is assumed. The latter is a mixture between a Bernoulli distribution and a Beta distribution.
Corrupted data sets containing noisy or missing observations are prevalent in various contemporary applications such as economics, finance and bioinformatics. Despite the recent methodological and algorithmic advances in high-dimensional multi-response regression, how to achieve scalable and interpretable estimation under contaminated covariates is unclear. In this paper, we develop a new methodology called convex conditioned sequential sparse learning (COSS) for error-in-variables multi-response regression under both additive measurement errors and random missing data. It combines the strengths of the recently developed sequential sparse factor regression and the nearest positive semi-definite matrix projection, thus enjoying stepwise convexity and scalability in large-scale association analyses. Comprehensive theoretical guarantees are provided and we demonstrate the effectiveness of the proposed methodology through numerical studies.