No Arabic abstract
Additive models, as a natural generalization of linear regression, have played an important role in studying nonlinear relationships. Despite of a rich literature and many recent advances on the topic, the statistical inference problem in additive models is still relatively poorly understood. Motivated by the inference for the exposure effect and other applications, we tackle in this paper the statistical inference problem for $f_1(x_0)$ in additive models, where $f_1$ denotes the univariate function of interest and $f_1(x_0)$ denotes its first order derivative evaluated at a specific point $x_0$. The main challenge for this local inference problem is the understanding and control of the additional uncertainty due to the need of estimating other components in the additive model as nuisance functions. To address this, we propose a decorrelated local linear estimator, which is particularly useful in reducing the effect of the nuisance function estimation error on the estimation accuracy of $f_1(x_0)$. We establish the asymptotic limiting distribution for the proposed estimator and then construct confidence interval and hypothesis testing procedures for $f_1(x_0)$. The variance level of the proposed estimator is of the same order as that of the local least squares in nonparametric regression, or equivalently the additive model with one component, while the bias of the proposed estimator is jointly determined by the statistical accuracies in estimating the nuisance functions and the relationship between the variable of interest and the nuisance variables. The method is developed for general additive models and is demonstrated in the high-dimensional sparse setting.
The density ratio model (DRM) provides a flexible and useful platform for combining information from multiple sources. In this paper, we consider statistical inference under two-sample DRMs with additional parameters defined through and/or additional auxiliary information expressed as estimating equations. We examine the asymptotic properties of the maximum empirical likelihood estimators (MELEs) of the unknown parameters in the DRMs and/or defined through estimating equations, and establish the chi-square limiting distributions for the empirical likelihood ratio (ELR) statistics. We show that the asymptotic variance of the MELEs of the unknown parameters does not decrease if one estimating equation is dropped. Similar properties are obtained for inferences on the cumulative distribution function and quantiles of each of the populations involved. We also propose an ELR test for the validity and usefulness of the auxiliary information. Simulation studies show that correctly specified estimating equations for the auxiliary information result in more efficient estimators and shorter confidence intervals. Two real-data examples are used for illustrations.
We study the maximum score statistic to detect and estimate local signals in the form of change-points in the level, slope, or other property of a sequence of observations, and to segment the sequence when there appear to be multiple changes. We find that when observations are serially dependent, the change-points can lead to upwardly biased estimates of autocorrelations, resulting in a sometimes serious loss of power. Examples involving temperature variations, the level of atmospheric greenhouse gases, suicide rates and daily incidence of COVID-19 illustrate the general theory.
We give an overview over the usefulness of the concept of equivariance and invariance in the design of experiments for generalized linear models. In contrast to linear models here pairs of transformations have to be considered which act simultaneously on the experimental settings and on the location parameters in the linear component. Given the transformation of the experimental settings the parameter transformations are not unique and may be nonlinear to make further use of the model structure. The general concepts and results are illustrated by models with gamma distributed response. Locally optimal and maximin efficient design are obtained for the common D- and IMSE-criterion.
The Gini index is a popular inequality measure with many applications in social and economic studies. This paper studies semiparametric inference on the Gini indices of two semicontinuous populations. We characterize the distribution of each semicontinuous population by a mixture of a discrete point mass at zero and a continuous skewed positive component. A semiparametric density ratio model is then employed to link the positive components of the two distributions. We propose the maximum empirical likelihood estimators of the two Gini indices and their difference, and further investigate the asymptotic properties of the proposed estimators. The asymptotic results enable us to construct confidence intervals and perform hypothesis tests for the two Gini indices and their difference. We show that the proposed estimators are more efficient than the existing fully nonparametric estimators. The proposed estimators and the asymptotic results are also applicable to cases without excessive zero values. Simulation studies show the superiority of our proposed method over existing methods. Two real-data applications are presented using the proposed methods.
We study a functional linear regression model that deals with functional responses and allows for both functional covariates and high-dimensional vector covariates. The proposed model is flexible and nests several functional regression models in the literature as special cases. Based on the theory of reproducing kernel Hilbert spaces (RKHS), we propose a penalized least squares estimator that can accommodate functional variables observed on discrete sample points. Besides a conventional smoothness penalty, a group Lasso-type penalty is further imposed to induce sparsity in the high-dimensional vector predictors. We derive finite sample theoretical guarantees and show that the excess prediction risk of our estimator is minimax optimal. Furthermore, our analysis reveals an interesting phase transition phenomenon that the optimal excess risk is determined jointly by the smoothness and the sparsity of the functional regression coefficients. A novel efficient optimization algorithm based on iterative coordinate descent is devised to handle the smoothness and group penalties simultaneously. Simulation studies and real data applications illustrate the promising performance of the proposed approach compared to the state-of-the-art methods in the literature.