No Arabic abstract
It is possible to approach regression analysis with random covariates from a semiparametric perspective where information is combined from multiple multivariate sources. The approach assumes a semiparametric density ratio model where multivariate distributions are regressed on a reference distribution. A kernel density estimator can be constructed from many data sources in conjunction with the semiparametric model. The estimator is shown to be more efficient than the traditional single-sample kernel density estimator, and its optimal bandwidth is discussed in some detail. Each multivariate distribution and the corresponding conditional expectation (regression) of interest are estimated from the combined data using all sources. Graphical and quantitative diagnostic tools are suggested to assess model validity. The method is applied in quantifying the effect of height and age on weight of germ cell testicular cancer patients. Comparisons are made with multiple regression, generalized additive models (GAM) and nonparametric kernel regression.
We introduce a new class of semiparametric latent variable models for long memory discretized event data. The proposed methodology is motivated by a study of bird vocalizations in the Amazon rain forest; the timings of vocalizations exhibit self-similarity and long range dependence ruling out models based on Poisson processes. The proposed class of FRActional Probit (FRAP) models is based on thresholding of a latent process consisting of an additive expansion of a smooth Gaussian process with a fractional Brownian motion. We develop a Bayesian approach to inference using Markov chain Monte Carlo, and show good performance in simulation studies. Applying the methods to the Amazon bird vocalization data, we find substantial evidence for self-similarity and non-Markovian/Poisson dynamics. To accommodate the bird vocalization data, in which there are many different species of birds exhibiting their own vocalization dynamics, a hierarchical expansion of FRAP is provided in Supplementary Materials.
Beta regression has been extensively used by statisticians and practitioners to model bounded continuous data and there is no strong and similar competitor having its main features. A class of normalized inverse-Gaussian (N-IG) process was introduced in the literature, being explored in the Bayesian context as a powerful alternative to the Dirichlet process. Until this moment, no attention has been paid for the univariate N-IG distribution in the classical inference. In this paper, we propose the bessel regression based on the univariate N-IG distribution, which is a robust alternative to the beta model. This robustness is illustrated through simulated and real data applications. The estimation of the parameters is done through an Expectation-Maximization algorithm and the paper discusses how to perform inference. A useful and practical discrimination procedure is proposed for model selection between bessel and beta regressions. Monte Carlo simulation results are presented to verify the finite-sample behavior of the EM-based estimators and the discrimination procedure. Further, the performances of the regressions are evaluated under misspecification, which is a critical point showing the robustness of the proposed model. Finally, three empirical illustrations are explored to confront results from bessel and beta regressions.
This paper introduces and analyzes a stochastic search method for parameter estimation in linear regression models in the spirit of Beran and Millar (1987). The idea is to generate a random finite subset of a parameter space which will automatically contain points which are very close to an unknown true parameter. The motivation for this procedure comes from recent work of Duembgen, Samworth and Schuhmacher (2011) on regression models with log-concave error distributions.
As a competitive alternative to least squares regression, quantile regression is popular in analyzing heterogenous data. For quantile regression model specified for one single quantile level $tau$, major difficulties of semiparametric efficient estimation are the unavailability of a parametric efficient score and the conditional density estimation. In this paper, with the help of the least favorable submodel technique, we first derive the semiparametric efficient scores for linear quantile regression models that are assumed for a single quantile level, multiple quantile levels and all the quantile levels in $(0,1)$ respectively. Our main discovery is a one-step (nearly) semiparametric efficient estimation for the regression coefficients of the quantile regression models assumed for multiple quantile levels, which has several advantages: it could be regarded as an optimal way to pool information across multiple/other quantiles for efficiency gain; it is computationally feasible and easy to implement, as the initial estimator is easily available; due to the nature of quantile regression models under investigation, the conditional density estimation is straightforward by plugging in an initial estimator. The resulting estimator is proved to achieve the corresponding semiparametric efficiency lower bound under regularity conditions. Numerical studies including simulations and an example of birth weight of children confirms that the proposed estimator leads to higher efficiency compared with the Koenker-Bassett quantile regression estimator for all quantiles of interest.
In fitting a mixture of linear regression models, normal assumption is traditionally used to model the error and then regression parameters are estimated by the maximum likelihood estimators (MLE). This procedure is not valid if the normal assumption is violated. To relax the normal assumption on the error distribution hence reduce the modeling bias, we propose semiparametric mixture of linear regression models with unspecified error distributions. We establish a more general identifiability result under weaker conditions than existing results, construct a class of new estimators, and establish their asymptotic properties. These asymptotic results also apply to many existing semiparametric mixture regression estimators whose asymptotic properties have remained unknown due to the inherent difficulties in obtaining them. Using simulation studies, we demonstrate the superiority of the proposed estimators over the MLE when the normal error assumption is violated and the comparability when the error is normal. Analysis of a newly collected Equine Infectious Anemia Virus data in 2017 is employed to illustrate the usefulness of the new estimator.