No Arabic abstract
Irregular functional data in which densely sampled curves are observed over different ranges pose a challenge for modeling and inference, and sensitivity to outlier curves is a concern in applications. Motivated by applications in quantitative ultrasound signal analysis, this paper investigates a class of robust M-estimators for partially observed functional data including functional location and quantile estimators. Consistency of the estimators is established under general conditions on the partial observation process. Under smoothness conditions on the class of M-estimators, asymptotic Gaussian process approximations are established and used for large sample inference. The large sample approximations justify a bootstrap approximation for robust inferences about the functional response process. The performance is demonstrated in simulations and in the analysis of irregular functional data from quantitative ultrasound analysis.
Classical semiparametric inference with missing outcome data is not robust to contamination of the observed data and a single observation can have arbitrarily large influence on estimation of a parameter of interest. This sensitivity is exacerbated when inverse probability weighting methods are used, which may overweight contaminated observations. We introduce inverse probability weighted, double robust and outcome regression estimators of location and scale parameters, which are robust to contamination in the sense that their influence function is bounded. We give asymptotic properties and study finite sample behaviour. Our simulated experiments show that contamination can be more serious a threat to the quality of inference than model misspecification. An interesting aspect of our results is that the auxiliary outcome model used to adjust for ignorable missingness by some of the estimators, is also useful to protect against contamination. We also illustrate through a case study how both adjustment to ignorable missingness and protection against contamination are achieved through weighting schemes, which can be contrasted to gain further insights.
Functional principal component analysis is essential in functional data analysis, but the inferences will become unconvincing when some non-Gaussian characteristics occur, such as heavy tail and skewness. The focus of this paper is to develop a robust functional principal component analysis methodology in dealing with non-Gaussian longitudinal data, for which sparsity and irregularity along with non-negligible measurement errors must be considered. We introduce a Kendalls $tau$ function whose particular properties make it a nice proxy for the covariance function in the eigenequation when handling non-Gaussian cases. Moreover, the estimation procedure is presented and the asymptotic theory is also established. We further demonstrate the superiority and robustness of our method through simulation studies and apply the method to the longitudinal CD4 cell count data in an AIDS study.
This paper is concerned with model averaging estimation for partially linear functional score models. These models predict a scalar response using both parametric effect of scalar predictors and non-parametric effect of a functional predictor. Within this context, we develop a Mallows-type criterion for choosing weights. The resulting model averaging estimator is proved to be asymptotically optimal under certain regularity conditions in terms of achieving the smallest possible squared error loss. Simulation studies demonstrate its superiority or comparability to information criterion score-based model selection and averaging estimators. The proposed procedure is also applied to two real data sets for illustration. That the components of nonparametric part are unobservable leads to a more complicated situation than ordinary partially linear models (PLM) and a different theoretical derivation from those of PLM.
Partially observed cured data occur in the analysis of spontaneous abortion (SAB) in observational studies in pregnancy. In contrast to the traditional cured data, such data has an observable `cured portion as women who do not abort spontaneously. The data is also subject to left truncate in addition to right-censoring because women may enter or withdraw from a study any time during their pregnancy. Left truncation in particular causes unique bias in the presence of a cured portion. In this paper, we study a cure rate model and develop a conditional nonparametric maximum likelihood approach. To tackle the computational challenge we adopt an EM algorithm making use of ghost copies of the data, and a closed form variance estimator is derived. Under suitable assumptions, we prove the consistency of the resulting estimator involving an unbounded cumulative baseline hazard function, as well as the asymptotic normality. Simulation results are carried out to evaluate the finite sample performance. We present the analysis of the motivating SAB study to illustrate the power of our model addressing both occurrence and timing of SAB, as compared to existing approaches in practice.
A novel approach to perform unsupervised sequential learning for functional data is proposed. Our goal is to extract reference shapes (referred to as templates) from noisy, deformed and censored realizations of curves and images. Our model generalizes the Bayesian dense deformable template model (Allassonni`ere et al., 2007), a hierarchical model in which the template is the function to be estimated and the deformation is a nuisance, assumed to be random with a known prior distribution. The templates are estimated using a Monte Carlo version of the online Expectation-Maximization algorithm, extending the work from Cappe and Moulines (2009). Our sequential inference framework is significantly more computationally efficient than equivalent batch learning algorithms, especially when the missing data is high-dimensional. Some numerical illustrations on curve registration problem and templates extraction from images are provided to support our findings.