No Arabic abstract
Causal models with unobserved variables impose nontrivial constraints on the distributions over the observed variables. When a common cause of two variables is unobserved, it is impossible to uncover the causal relation between them without making additional assumptions about the model. In this work, we consider causal models with a promise that unobserved variables have known cardinalities. We derive inequality constraints implied by d-separation in such models. Moreover, we explore the possibility of leveraging this result to study causal influence in models that involve quantum systems.
Directed graphical models specify noisy functional relationships among a collection of random variables. In the Gaussian case, each such model corresponds to a semi-algebraic set of positive definite covariance matrices. The set is given via parametrization, and much work has gone into obtaining an implicit description in terms of polynomial (in-)equalities. Implicit descriptions shed light on problems such as parameter identification, model equivalence, and constraint-based statistical inference. For models given by directed acyclic graphs, which represent settings where all relevant variables are observed, there is a complete theory: All conditional independence relations can be found via graphical $d$-separation and are sufficient for an implicit description. The situation is far more complicated, however, when some of the variables are hidden (or in other words, unobserved or latent). We consider models associated to mixed graphs that capture the effects of hidden variables through correlated error terms. The notion of trek separation explains when the covariance matrix in such a model has submatrices of low rank and generalizes $d$-separation. However, in many cases, such as the infamous Verma graph, the polynomials defining the graphical model are not determinantal, and hence cannot be explained by $d$-separation or trek-separation. In this paper, we show that these constraints often correspond to the vanishing of nested determinants and can be graphically explained by a notion of restricted trek separation.
We consider a bivariate time series $(X_t,Y_t)$ that is given by a simple linear autoregressive model. Assuming that the equations describing each variable as a linear combination of past values are considered structural equations, there is a clear meaning of how intervening on one particular $X_t$ influences $Y_{t}$ at later times $t>t$. In the present work, we describe conditions under which one can define a causal model between variables that are coarse-grained in time, thus admitting statements like `setting $X$ to $x$ changes $Y$ in a certain way without referring to specific time instances. We show that particularly simple statements follow in the frequency domain, thus providing meaning to interventions on frequencies.
We develop $D$-optimal designs for linear models with first-order interactions on a subset of the $2^K$ full factorial design region, when both the number of factors set to the higher level and the number of factors set to the lower level are simultaneously bounded by the same threshold. It turns out that in the case of narrow margins the optimal design is concentrated only on those design points, for which either the threshold is attained or the numbers of high and low levels are as equal as possible. In the case of wider margins the settings are more spread and the resulting optimal designs are as efficient as a full factorial design. These findings also apply to other optimality criteria.
We introduce estimation and test procedures through divergence minimization for models satisfying linear constraints with unknown parameter. Several statistical examples and motivations are given. These procedures extend the empirical likelihood (EL) method and share common features with generalized empirical likelihood (GEL). We treat the problems of existence and characterization of the divergence projections of probability measures on sets of signed finite measures. Our approach allows for a study of the estimates under misspecification. The asymptotic behavior of the proposed estimates are studied using the dual representation of the divergences and the explicit forms of the divergence projections. We discuss the problem of the choice of the divergence under various respects. Also we handle efficiency and robustness properties of minimum divergence estimates. A simulation study shows that the Hellinger divergence enjoys good efficiency and robustness properties.
Classical least squares estimators are well-known to be robust with respect to moment assumptions concerning the error distribution in a wide variety of finite-dimensional statistical problems; generally only a second moment assumption is required for least squares estimators to maintain the same rate of convergence that they would satisfy if the errors were assumed to be Gaussian. In this paper, we give a geometric characterization of the robustness of shape-restricted least squares estimators (LSEs) to error distributions with an $L_{2,1}$ moment, in terms of the `localized envelopes of the model. This envelope perspective gives a systematic approach to proving oracle inequalities for the LSEs in shape-restricted regression problems in the random design setting, under a minimal $L_{2,1}$ moment assumption on the errors. The canonical isotonic and convex regression models, and a more challenging additive regression model with shape constraints are studied in detail. Strikingly enough, in the additive model both the adaptation and robustness properties of the LSE can be preserved, up to error distributions with an $L_{2,1}$ moment, for estimating the shape-constrained proxy of the marginal $L_2$ projection of the true regression function. This holds essentially regardless of whether or not the additive model structure is correctly specified. The new envelope perspective goes beyond shape constrained models. Indeed, at a general level, the localized envelopes give a sharp characterization of the convergence rate of the $L_2$ loss of the LSE between the worst-case rate as suggested by the recent work of the authors [25], and the best possible parametric rate.