No Arabic abstract
Large, non-Gaussian spatial datasets pose a considerable modeling challenge as the dependence structure implied by the model needs to be captured at different scales, while retaining feasible inference. Skew-normal and skew-t distributions have only recently begun to appear in the spatial statistics literature, without much consideration, however, for the ability to capture dependence at multiple resolutions, and simultaneously achieve feasible inference for increasingly large data sets. This article presents the first multi-resolution spatial model inspired by the skew-t distribution, where a large-scale effect follows a multivariate normal distribution and the fine-scale effects follow a multivariate skew-normal distributions. The resulting marginal distribution for each region is skew-t, thereby allowing for greater flexibility in capturing skewness and heavy tails characterizing many environmental datasets. Likelihood-based inference is performed using a Monte Carlo EM algorithm. The model is applied as a stochastic generator of daily wind speeds over Saudi Arabia.
Gaussian processes are popular and flexible models for spatial, temporal, and functional data, but they are computationally infeasible for large datasets. We discuss Gaussian-process approximations that use basis functions at multiple resolutions to achieve fast inference and that can (approximately) represent any spatial covariance structure. We consider two special cases of this multi-resolution-approximation framework, a taper version and a domain-partitioning (block) version. We describe theoretical properties and inference procedures, and study the computational complexity of the methods. Numerical comparisons and an application to satellite data are also provided.
Automated sensing instruments on satellites and aircraft have enabled the collection of massive amounts of high-resolution observations of spatial fields over large spatial regions. If these datasets can be efficiently exploited, they can provide new insights on a wide variety of issues. However, traditional spatial-statistical techniques such as kriging are not computationally feasible for big datasets. We propose a multi-resolution approximation (M-RA) of Gaussian processes observed at irregular locations in space. The M-RA process is specified as a linear combination of basis functions at multiple levels of spatial resolution, which can capture spatial structure from very fine to very large scales. The basis functions are automatically chosen to approximate a given covariance function, which can be nonstationary. All computations involving the M-RA, including parameter inference and prediction, are highly scalable for massive datasets. Crucially, the inference algorithms can also be parallelized to take full advantage of large distributed-memory computing environments. In comparisons using simulated data and a large satellite dataset, the M-RA outperforms a related state-of-the-art method.
Mixture of Experts (MoE) is a popular framework in the fields of statistics and machine learning for modeling heterogeneity in data for regression, classification and clustering. MoE for continuous data are usually based on the normal distribution. However, it is known that for data with asymmetric behavior, heavy tails and atypical observations, the use of the normal distribution is unsuitable. We introduce a new robust non-normal mixture of experts modeling using the skew $t$ distribution. The proposed skew $t$ mixture of experts, named STMoE, handles these issues of the normal mixtures experts regarding possibly skewed, heavy-tailed and noisy data. We develop a dedicated expectation conditional maximization (ECM) algorithm to estimate the model parameters by monotonically maximizing the observed data log-likelihood. We describe how the presented model can be used in prediction and in model-based clustering of regression data. Numerical experiments carried out on simulated data show the effectiveness and the robustness of the proposed model in fitting non-linear regression functions as well as in model-based clustering. Then, the proposed model is applied to the real-world data of tone perception for musical data analysis, and the one of temperature anomalies for the analysis of climate change data. The obtained results confirm the usefulness of the model for practical data analysis applications.
Facing increasing domestic energy consumption from population growth and industrialization, Saudi Arabia is aiming to reduce its reliance on fossil fuels and to broaden its energy mix by expanding investment in renewable energy sources, including wind energy. A preliminary task in the development of wind energy infrastructure is the assessment of wind energy potential, a key aspect of which is the characterization of its spatio-temporal behavior. In this study we examine the impact of internal climate variability on seasonal wind power density fluctuations over Saudi Arabia using 30 simulations from the Large Ensemble Project (LENS) developed at the National Center for Atmospheric Research. Furthermore, a spatio-temporal model for daily wind speed is proposed with neighbor-based cross-temporal dependence, and a multivariate skew-t distribution to capture the spatial patterns of higher order moments. The model can be used to generate synthetic time series over the entire spatial domain that adequately reproduce the internal variability of the LENS dataset.
L_p-quantiles represent an important class of generalised quantiles and are defined as the minimisers of an expected asymmetric power function, see Chen (1996). For p=1 and p=2 they correspond respectively to the quantiles and the expectiles. In his paper Koenker (1993) showed that the tau quantile and the tau expectile coincide for every tau in (0,1) for a class of rescaled Student t distributions with two degrees of freedom. Here, we extend this result proving that for the Student t distribution with p degrees of freedom, the tau quantile and the tau L_p-quantile coincide for every tau in (0,1) and the same holds for any affine transformation. Furthermore, we investigate the properties of L_p-quantiles and provide recursive equations for the truncated moments of the Student t distribution.