No Arabic abstract
Partially linear additive models generalize the linear models since they model the relation between a response variable and covariates by assuming that some covariates are supposed to have a linear relation with the response but each of the others enter with unknown univariate smooth functions. The harmful effect of outliers either in the residuals or in the covariates involved in the linear component has been described in the situation of partially linear models, that is, when only one nonparametric component is involved in the model. When dealing with additive components, the problem of providing reliable estimators when atypical data arise, is of practical importance motivating the need of robust procedures. Hence, we propose a family of robust estimators for partially linear additive models by combining $B-$splines with robust linear regression estimators. We obtain consistency results, rates of convergence and asymptotic normality for the linear components, under mild assumptions. A Monte Carlo study is carried out to compare the performance of the robust proposal with its classical counterpart under different models and contamination schemes. The numerical experiments show the advantage of the proposed methodology for finite samples. We also illustrate the usefulness of the proposed approach on a real data set.
We consider averaging a number of candidate models to produce a prediction of lower risk in the context of partially linear functional additive models. These models incorporate the parametric effect of scalar variables and the additive effect of a functional variable to describe the relationship between a response variable and regressors. We develop a model averaging scheme that assigns the weights by minimizing a cross-validation criterion. Under the framework of model misspecification, the resulting estimator is proved to be asymptotically optimal in terms of the lowest possible square error loss for prediction. Also, simulation studies and real data analysis demonstrate the good performance of our proposed method.
This paper is concerned with model averaging estimation for partially linear functional score models. These models predict a scalar response using both parametric effect of scalar predictors and non-parametric effect of a functional predictor. Within this context, we develop a Mallows-type criterion for choosing weights. The resulting model averaging estimator is proved to be asymptotically optimal under certain regularity conditions in terms of achieving the smallest possible squared error loss. Simulation studies demonstrate its superiority or comparability to information criterion score-based model selection and averaging estimators. The proposed procedure is also applied to two real data sets for illustration. That the components of nonparametric part are unobservable leads to a more complicated situation than ordinary partially linear models (PLM) and a different theoretical derivation from those of PLM.
We develop a unified approach to hypothesis testing for various types of widely used functional linear models, such as scalar-on-function, function-on-function and function-on-scalar models. In addition, the proposed test applies to models of mixed types, such as models with both functional and scalar predictors. In contrast with most existing methods that rest on the large-sample distributions of test statistics, the proposed method leverages the technique of bootstrapping max statistics and exploits the variance decay property that is an inherent feature of functional data, to improve the empirical power of tests especially when the sample size is limited and the signal is relatively weak. Theoretical guarantees on the validity and consistency of the proposed test are provided uniformly for a class of test statistics.
The generalized linear models (GLM) have been widely used in practice to model non-Gaussian response variables. When the number of explanatory features is relatively large, scientific researchers are of interest to perform controlled feature selection in order to simplify the downstream analysis. This paper introduces a new framework for feature selection in GLMs that can achieve false discovery rate (FDR) control in two asymptotic regimes. The key step is to construct a mirror statistic to measure the importance of each feature, which is based upon two (asymptotically) independent estimates of the corresponding true coefficient obtained via either the data-splitting method or the Gaussian mirror method. The FDR control is achieved by taking advantage of the mirror statistics property that, for any null feature, its sampling distribution is (asymptotically) symmetric about 0. In the moderate-dimensional setting in which the ratio between the dimension (number of features) p and the sample size n converges to a fixed value, we construct the mirror statistic based on the maximum likelihood estimation. In the high-dimensional setting where p is much larger than n, we use the debiased Lasso to build the mirror statistic. Compared to the Benjamini-Hochberg procedure, which crucially relies on the asymptotic normality of the Z statistic, the proposed methodology is scale free as it only hinges on the symmetric property, thus is expected to be more robust in finite-sample cases. Both simulation results and a real data application show that the proposed methods are capable of controlling the FDR, and are often more powerful than existing methods including the Benjamini-Hochberg procedure and the knockoff filter.
We study a scalar-on-function historical linear regression model which assumes that the functional predictor does not influence the response when the time passes a certain cutoff point. We approach this problem from the perspective of locally sparse modeling, where a function is locally sparse if it is zero on a substantial portion of its defining domain. In the historical linear model, the slope function is exactly a locally sparse function that is zero beyond the cutoff time. A locally sparse estimate then gives rise to an estimate of the cutoff time. We propose a nested group bridge penalty that is able to specifically shrink the tail of a function. Combined with the B-spline basis expansion and penalized least squares, the nested group bridge approach can identify the cutoff time and produce a smooth estimate of the slope function simultaneously. The proposed locally sparse estimator is shown to be consistent, while its numerical performance is illustrated by simulation studies. The proposed method is demonstrated with an application of determining the effect of the past engine acceleration on the current particulate matter emission.