No Arabic abstract
Statistical modeling of animal movement is of critical importance. The continuous trajectory of an animals movements is only observed at discrete, often irregularly spaced time points. Most existing models cannot handle the unequal sampling interval naturally and/or do not allow inactivity periods such as resting or sleeping. The recently proposed moving-resting (MR) model is a Brownian motion governed by a telegraph process, which allows periods of inactivity in one state of the telegraph process. The MR model shows promise in modeling the movements of predators with long inactive periods such as many felids, but the lack of accommodation of measurement errors seriously prohibits its application in practice. Here we incorporate measurement errors in the MR model and derive basic properties of the model. Inferences are based on a composite likelihood using the Markov property of the chain composed by every other observed increments. The performance of the method is validated in finite sample simulation studies. Application to the movement data of a mountain lion in Wyoming illustrates the utility of the method.
Non-homogeneous Poisson processes are used in a wide range of scientific disciplines, ranging from the environmental sciences to the health sciences. Often, the central object of interest in a point process is the underlying intensity function. Here, we present a general model for the intensity function of a non-homogeneous Poisson process using measure transport. The model is built from a flexible bijective mapping that maps from the underlying intensity function of interest to a simpler reference intensity function. We enforce bijectivity by modeling the map as a composition of multiple simple bijective maps, and show that the model exhibits an important approximation property. Estimation of the flexible mapping is accomplished within an optimization framework, wherein computations are efficiently done using recent technological advances in deep learning and a graphics processing unit. Although we find that intensity function estimates obtained with our method are not necessarily superior to those obtained using conventional methods, the modeling representation brings with it other advantages such as facilitated point process simulation and uncertainty quantification. Modeling point processes in higher dimensions is also facilitated using our approach. We illustrate the use of our model on both simulated data, and a real data set containing the locations of seismic events near Fiji since 1964.
Gaussian processes (GPs) are highly flexible function estimators used for geospatial analysis, nonparametric regression, and machine learning, but they are computationally infeasible for large datasets. Vecchia approximations of GPs have been used to enable fast evaluation of the likelihood for parameter inference. Here, we study Vecchia approximations of spatial predictions at observed and unobserved locations, including obtaining joint predictive distributions at large sets of locations. We consider a general Vecchia framework for GP predictions, which contains some novel and some existing special cases. We study the accuracy and computational properties of these approaches theoretically and numerically, proving that our new methods exhibit linear computational complexity in the total number of spatial locations. We show that certain choices within the framework can have a strong effect on uncertainty quantification and computational cost, which leads to specific recommendations on which methods are most suitable for various settings. We also apply our methods to a satellite dataset of chlorophyll fluorescence, showing that the new methods are faster or more accurate than existing methods, and reduce unrealistic artifacts in prediction maps.
Statistical agencies are often asked to produce small area estimates (SAEs) for positively skewed variables. When domain sample sizes are too small to support direct estimators, effects of skewness of the response variable can be large. As such, it is important to appropriately account for the distribution of the response variable given available auxiliary information. Motivated by this issue and in order to stabilize the skewness and achieve normality in the response variable, we propose an area-level log-measurement error model on the response variable. Then, under our proposed modeling framework, we derive an empirical Bayes (EB) predictor of positive small area quantities subject to the covariates containing measurement error. We propose a corresponding mean squared prediction error (MSPE) of EB predictor using both a jackknife and a bootstrap method. We show that the order of the bias is $O(m^{-1})$, where $m$ is the number of small areas. Finally, we investigate the performance of our methodology using both design-based and model-based simulation studies.
In the stochastic frontier model, the composed error term consists of the measurement error and the inefficiency term. A general assumption is that the inefficiency term follows a truncated normal or exponential distribution. In a wide variety of models evaluating the cumulative distribution function of the composed error term is required. This work introduces and proves four representation theorems for these distributions - two for each distributional assumptions. These representations can be utilized for a fast and accurate evaluation.
Several methods have been proposed in the spatial statistics literature for the analysis of big data sets in continuous domains. However, new methods for analyzing high-dimensional areal data are still scarce. Here, we propose a scalable Bayesian modeling approach for smoothing mortality (or incidence) risks in high-dimensional data, that is, when the number of small areas is very large. The method is implemented in the R add-on package bigDM. Model fitting and inference is based on the idea of divide and conquer and use integrated nested Laplace approximations and numerical integration. We analyze the proposals empirical performance in a comprehensive simulation study that consider two model-free settings. Finally, the methodology is applied to analyze male colorectal cancer mortality in Spanish municipalities showing its benefits with regard to the standard approach in terms of goodness of fit and computational time.