No Arabic abstract
Empirical researchers often trim observations with small denominator A when they estimate moments of the form E[B/A]. Large trimming is a common practice to mitigate variance, but it incurs large trimming bias. This paper provides a novel method of correcting large trimming bias. If a researcher is willing to assume that the joint distribution between A and B is smooth, then a large trimming bias may be estimated well. With the bias correction, we also develop a valid and robust inference result for E[B/A].
A robust estimator for a wide family of mixtures of linear regression is presented. Robustness is based on the joint adoption of the Cluster Weighted Model and of an estimator based on trimming and restrictions. The selected model provides the conditional distribution of the response for each group, as in mixtures of regression, and further supplies local distributions for the explanatory variables. A novel version of the restrictions has been devised, under this model, for separately controlling the two sources of variability identified in it. This proposal avoids singularities in the log-likelihood, caused by approximate local collinearity in the explanatory variables or local exact fit in regressions, and reduces the occurrence of spurious local maximizers. In a natural way, due to the interaction between the model and the estimator, the procedure is able to resist the harmful influence of bad leverage points along the estimation of the mixture of regressions, which is still an open issue in the literature. The given methodology defines a well-posed statistical problem, whose estimator exists and is consistent to the corresponding solution of the population optimum, under widely general conditions. A feasible EM algorithm has also been provided to obtain the corresponding estimation. Many simulated examples and two real datasets have been chosen to show the ability of the procedure, on the one hand, to detect anomalous data, and, on the other hand, to identify the real cluster regressions without the influence of contamination.
With the rapid development of data collection and aggregation technologies in many scientific disciplines, it is becoming increasingly ubiquitous to conduct large-scale or online regression to analyze real-world data and unveil real-world evidence. In such applications, it is often numerically challenging or sometimes infeasible to store the entire dataset in memory. Consequently, classical batch-based estimation methods that involve the entire dataset are less attractive or no longer applicable. Instead, recursive estimation methods such as stochastic gradient descent that process data points sequentially are more appealing, exhibiting both numerical convenience and memory efficiency. In this paper, for scalable estimation of large or online survival data, we propose a stochastic gradient descent method which recursively updates the estimates in an online manner as data points arrive sequentially in streams. Theoretical results such as asymptotic normality and estimation efficiency are established to justify its validity. Furthermore, to quantify the uncertainty associated with the proposed stochastic gradient descent estimator and facilitate statistical inference, we develop a scalable resampling strategy that specifically caters to the large-scale or online setting. Simulation studies and a real data application are also provided to assess its performance and illustrate its practical utility.
This paper introduces to readers the new concept and methodology of confidence distribution and the modern-day distributional inference in statistics. This discussion should be of interest to people who would like to go into the depth of the statistical inference methodology and to utilize distribution estimators in practice. We also include in the discussion the topic of generalized fiducial inference, a special type of modern distributional inference, and relate it to the concept of confidence distribution. Several real data examples are also provided for practitioners. We hope that the selected content covers the greater part of the developments on this subject.
This paper considers fixed effects estimation and inference in linear and nonlinear panel data models with random coefficients and endogenous regressors. The quantities of interest -- means, variances, and other moments of the random coefficients -- are estimated by cross sectional sample moments of GMM estimators applied separately to the time series of each individual. To deal with the incidental parameter problem introduced by the noise of the within-individual estimators in short panels, we develop bias corrections. These corrections are based on higher-order asymptotic expansions of the GMM estimators and produce improved point and interval estimates in moderately long panels. Under asymptotic sequences where the cross sectional and time series dimensions of the panel pass to infinity at the same rate, the uncorrected estimator has an asymptotic bias of the same order as the asymptotic variance. The bias corrections remove the bias without increasing variance. An empirical example on cigarette demand based on Becker, Grossman and Murphy (1994) shows significant heterogeneity in the price effect across U.S. states.
Motivated by recent data analyses in biomedical imaging studies, we consider a class of image-on-scalar regression models for imaging responses and scalar predictors. We propose using flexible multivariate splines over triangulations to handle the irregular domain of the objects of interest on the images, as well as other characteristics of images. The proposed estimators of the coefficient functions are proved to be root-n consistent and asymptotically normal under some regularity conditions. We also provide a consistent and computationally efficient estimator of the covariance function. Asymptotic pointwise confidence intervals and data-driven simultaneous confidence corridors for the coefficient functions are constructed. Our method can simultaneously estimate and make inferences on the coefficient functions while incorporating spatial heterogeneity and spatial correlation. A highly efficient and scalable estimation algorithm is developed. Monte Carlo simulation studies are conducted to examine the finite-sample performance of the proposed method, which is then applied to the spatially normalized positron emission tomography data of the Alzheimers Disease Neuroimaging Initiative.