No Arabic abstract
Little known relations of the renown concept of the halfspace depth for multivariate data with notions from convex and affine geometry are discussed. Halfspace depth may be regarded as a measure of symmetry for random vectors. As such, the depth stands as a generalization of a measure of symmetry for convex sets, well studied in geometry. Under a mild assumption, the upper level sets of the halfspace depth coincide with the convex floating bodies used in the definition of the affine surface area for convex bodies in Euclidean spaces. These connections enable us to partially resolve some persistent open problems regarding theoretical properties of the depth.
We investigate the problem of inferring the causal predictors of a response $Y$ from a set of $d$ explanatory variables $(X^1,dots,X^d)$. Classical ordinary least squares regression includes all predictors that reduce the variance of $Y$. Using only the causal predictors instead leads to models that have the advantage of remaining invariant under interventions, loosely speaking they lead to invariance across different environments or heterogeneity patterns. More precisely, the conditional distribution of $Y$ given its causal predictors remains invariant for all observations. Recent work exploits such a stability to infer causal relations from data with different but known environments. We show that even without having knowledge of the environments or heterogeneity pattern, inferring causal relations is possible for time-ordered (or any other type of sequentially ordered) data. In particular, this allows detecting instantaneous causal relations in multivariate linear time series which is usually not the case for Granger causality. Besides novel methodology, we provide statistical confidence bounds and asymptotic detection results for inferring causal predictors, and present an application to monetary policy in macroeconomics.
Information geometry uses the formal tools of differential geometry to describe the space of probability distributions as a Riemannian manifold with an additional dual structure. The formal equivalence of compositional data with discrete probability distributions makes it possible to apply the same description to the sample space of Compositional Data Analysis (CoDA). The latter has been formally described as a Euclidean space with an orthonormal basis featuring components that are suitable combinations of the original parts. In contrast to the Euclidean metric, the information-geometric description singles out the Fisher information metric as the only one keeping the manifolds geometric structure invariant under equivalent representations of the underlying random variables. Well-known concepts that are valid in Euclidean coordinates, e.g., the Pythogorean theorem, are generalized by information geometry to corresponding notions that hold for more general coordinates. In briefly reviewing Euclidean CoDA and, in more detail, the information-geometric approach, we show how the latter justifies the use of distance measures and divergences that so far have received little attention in CoDA as they do not fit the Euclidean geometry favored by current thinking. We also show how entropy and relative entropy can describe amalgamations in a simple way, while Aitchison distance requires the use of geometric means to obtain more succinct relationships. We proceed to prove the information monotonicity property for Aitchison distance. We close with some thoughts about new directions in CoDA where the rich structure that is provided by information geometry could be exploited.
We prove that any implementation of pivotal sampling is more efficient than multinomial sampling. This property entails the weak consistency of the Horvitz-Thompson estimator and the existence of a conservative variance estimator. A small simulation study supports our findings.
We present a joint copula-based model for insurance claims and sizes. It uses bivariate copulae to accommodate for the dependence between these quantities. We derive the general distribution of the policy loss without the restrictive assumption of independence. We illustrate that this distribution tends to be skewed and multi-modal, and that an independence assumption can lead to substantial bias in the estimation of the policy loss. Further, we extend our framework to regression models by combining marginal generalized linear models with a copula. We show that this approach leads to a flexible class of models, and that the parameters can be estimated efficiently using maximum-likelihood. We propose a test procedure for the selection of the optimal copula family. The usefulness of our approach is illustrated in a simulation study and in an analysis of car insurance policies.
Testing large covariance matrices is of fundamental importance in statistical analysis with high-dimensional data. In the past decade, three types of test statistics have been studied in the literature: quadratic form statistics, maximum form statistics, and their weighted combination. It is known that quadratic form statistics would suffer from low power against sparse alternatives and maximum form statistics would suffer from low power against dense alternatives. The weighted combination methods were introduced to enhance the power of quadratic form statistics or maximum form statistics when the weights are appropriately chosen. In this paper, we provide a new perspective to exploit the full potential of quadratic form statistics and maximum form statistics for testing high-dimensional covariance matrices. We propose a scale-invariant power enhancement test based on Fishers method to combine the p-values of quadratic form statistics and maximum form statistics. After carefully studying the asymptotic joint distribution of quadratic form statistics and maximum form statistics, we prove that the proposed combination method retains the correct asymptotic size and boosts the power against more general alternatives. Moreover, we demonstrate the finite-sample performance in simulation studies and a real application.