No Arabic abstract
In statistics, independent, identically distributed random samples do not carry a natural ordering, and their statistics are typically invariant with respect to permutations of their order. Thus, an $n$-sample in a space $M$ can be considered as an element of the quotient space of $M^n$ modulo the permutation group. The present paper takes this definition of sample space and the related concept of orbit types as a starting point for developing a geometric perspective on statistics. We aim at deriving a general mathematical setting for studying the behavior of empirical and population means in spaces ranging from smooth Riemannian manifolds to general stratified spaces. We fully describe the orbifold and path-metric structure of the sample space when $M$ is a manifold or path-metric space, respectively. These results are non-trivial even when $M$ is Euclidean. We show that the infinite sample space exists in a Gromov-Hausdorff type sense and coincides with the Wasserstein space of probability distributions on $M$. We exhibit Frechet means and $k$-means as metric projections onto 1-skeleta or $k$-skeleta in Wasserstein space, and we define a new and more general notion of polymeans. This geometric characterization via metric projections applies equally to sample and population means, and we use it to establish asymptotic properties of polymeans such as consistency and asymptotic normality.
Designing experiments for generalized linear models is difficult because optimal designs depend on unknown parameters. Here we investigate local optimality. We propose to study for a given design its region of optimality in parameter space. Often these regions are semi-algebraic and feature interesting symmetries. We demonstrate this with the Rasch Poisson counts model. For any given interaction order between the explanatory variables we give a characterization of the regions of optimality of a special saturated design. This extends known results from the case of no interaction. We also give an algebraic and geometric perspective on optimality of experimental designs for the Rasch Poisson counts model using polyhedral and spectrahedral geometry.
Gaussian double Markovian models consist of covariance matrices constrained by a pair of graphs specifying zeros simultaneously in the covariance matrix and its inverse. We study the semi-algebraic geometry of these models, in particular their dimension, smoothness and connectedness. Results on their vanishing ideals and conditional independence ideals are also included, and we put them into the general framework of conditional independence models. We end with several open questions and conjectures.
Matching methods are widely used for causal inference in observational studies. Among them, nearest neighbor matching is arguably the most popular. However, nearest neighbor matching does not generally yield an average treatment effect estimator that is $sqrt{n}$-consistent (Abadie and Imbens, 2006). Are matching methods not $sqrt{n}$-consistent in general? In this paper, we study a recent class of matching methods that use integer programming to directly target aggregate covariate balance as opposed to finding close neighbor matches. We show that under suitable conditions these methods can yield simple estimators that are $sqrt{n}$-consistent and asymptotically optimal.
Within the class of reflexive Banach spaces, we prove a metric characterization of the class of asymptotic-$c_0$ spaces in terms of a bi-Lipschitz invariant which involves metrics that generalize the Hamming metric on $k$-subsets of $mathbb{N}$. We apply this characterization to show that the class of separable, reflexive, and asymptotic-$c_0$ Banach spaces is non-Borel co-analytic. Finally, we introduce a relaxation of the asymptotic-$c_0$ property, called the asymptotic-subsequential-$c_0$ property, which is a partial obstruction to the equi-coarse embeddability of the sequence of Hamming graphs. We present examples of spaces that are asymptotic-subsequential-$c_0$. In particular $T^*(T^*)$ is asymptotic-subsequential-$c_0$ where $T^*$ is Tsirelsons original space.
A scoring rule is a loss function measuring the quality of a quoted probability distribution $Q$ for a random variable $X$, in the light of the realized outcome $x$ of $X$; it is proper if the expected score, under any distribution $P$ for $X$, is minimized by quoting $Q=P$. Using the fact that any differentiable proper scoring rule on a finite sample space ${mathcal{X}}$ is the gradient of a concave homogeneous function, we consider when such a rule can be local in the sense of depending only on the probabilities quoted for points in a nominated neighborhood of $x$. Under mild conditions, we characterize such a proper local scoring rule in terms of a collection of homogeneous functions on the cliques of an undirected graph on the space ${mathcal{X}}$. A useful property of such rules is that the quoted distribution $Q$ need only be known up to a scale factor. Examples of the use of such scoring rules include Besags pseudo-likelihood and Hyv{a}rinens method of ratio matching.