No Arabic abstract
With the availability of more non-euclidean data objects, statisticians are faced with the task of developing appropriate statistical methods. For regression models in which the predictors lie in $R^p$ and the response variables are situated in a metric space, conditional Frechet means can be used to define the Frechet regression function. Global and local Frechet methods have recently been developed for modeling and estimating this regression function as extensions of multiple and local linear regression, respectively. This paper expands on these methodologies by proposing the Frechet Single Index (FSI) model and utilizing local Frechet along with $M$-estimation to estimate both the index and the underlying regression function. The method is illustrated by simulations for response objects on the surface of the unit sphere and through an analysis of human mortality data in which lifetable data are represented by distributions of age-of-death, viewed as elements of the Wasserstein space of distributions.
Single index models provide an effective dimension reduction tool in regression, especially for high dimensional data, by projecting a general multivariate predictor onto a direction vector. We propose a novel single-index model for regression models where metric space-valued random object responses are coupled with multivariate Euclidean predictors. The responses in this regression model include complex, non-Euclidean data, including covariance matrices, graph Laplacians of networks, and univariate probability distribution functions among other complex objects that lie in abstract metric spaces. Frechet regression has provided an approach for modeling the conditional mean of such random objects given multivariate Euclidean vectors, but it does not provide for regression parameters such as slopes or intercepts, since the metric space-valued responses are not amenable to linear operations. We show here that for the case of multivariate Euclidean predictors, the parameters that define a single index and associated projection vector can be used to substitute for the inherent absence of parameters in Frechet regression. Specifically, we derive the asymptotic consistency of suitable estimates of these parameters subject to an identifiability condition. Consistent estimation of the link function of the single index Frechet regression model is obtained through local Frechet regression. We demonstrate the finite sample performance of estimation for the proposed single index Frechet regression model through simulation studies, including the special cases of probability distributions and graph adjacency matrices. The method is also illustrated for resting-state functional Magnetic Resonance Imaging (fMRI) data from the ADNI study.
We investigate R-optimal designs for multi-response regression models with multi-factors, where the random errors in these models are correlated. Several theoretical results are derived for Roptimal designs, including scale invariance, reflection symmetry, line and plane symmetry, and dependence on the covariance matrix of the errors. All the results can be applied to linear and nonlinear models. In addition, an efficient algorithm based on an interior point method is developed for finding R-optimal designs on discrete design spaces. The algorithm is very flexible, and can be applied to any multi-response regression model.
Modern-day problems in statistics often face the challenge of exploring and analyzing complex non-Euclidean object data that do not conform to vector space structures or operations. Examples of such data objects include covariance matrices, graph Laplacians of networks and univariate probability distribution functions. In the current contribution a new concurrent regression model is proposed to characterize the time-varying relation between an object in a general metric space (as response) and a vector in $reals^p$ (as predictor), where concepts from Frechet regression is employed. Concurrent regression has been a well-developed area of research for Euclidean predictors and responses, with many important applications for longitudinal studies and functional data. We develop generaliz
Environmental variability often has substantial impacts on natural populations and communities through its effects on the performance of individuals. Because organisms responses to environmental conditions are often nonlinear (e.g., decreasing performance on both sides of an optimal temperature), the mean response is often different from the response in the mean environment. Ye et. al. 2020, proposed testing for the presence of such variance effects on individual or population growth rates by estimating the Jensen Effect, the difference in average growth rates under varying versus fixed environments, in functional single index models for environmental effects on growth. In this paper, we extend this analysis to effect of environmental variance on reproduction and survival, which have count and binary outcomes. In the standard generalized linear models used to analyze such data the direction of the Jensen Effect is tacitly assumed a priori by the models link function. Here we extend the methods of Ye et. al. 2020 using a generalized single index model to test whether this assumed direction is contradicted by the data. We show that our test has reasonable power under mild alternatives, but requires sample sizes that are larger than are often available. We demonstrate our methods on a long-term time series of plant ground cover on the Idaho steppe.
Field observations form the basis of many scientific studies, especially in ecological and social sciences. Despite efforts to conduct such surveys in a standardized way, observations can be prone to systematic measurement errors. The removal of systematic variability introduced by the observation process, if possible, can greatly increase the value of this data. Existing non-parametric techniques for correcting such errors assume linear additive noise models. This leads to biased estimates when applied to generalized linear models (GLM). We present an approach based on residual functions to address this limitation. We then demonstrate its effectiveness on synthetic data and show it reduces systematic detection variability in moth surveys.