No Arabic abstract
The assumption of separability of the covariance operator for a random image or hypersurface can be of substantial use in applications, especially in situations where the accurate estimation of the full covariance structure is unfeasible, either for computational reasons, or due to a small sample size. However, inferential tools to verify this assumption are somewhat lacking in high-dimensional or functional {data analysis} settings, where this assumption is most relevant. We propose here to test separability by focusing on $K$-dimensional projections of the difference between the covariance operator and a nonparametric separable approximation. The subspace we project onto is one generated by the eigenfunctions of the covariance operator estimated under the separability hypothesis, negating the need to ever estimate the full non-separable covariance. We show that the rescaled difference of the sample covariance operator with its separable approximation is asymptotically Gaussian. As a by-product of this result, we derive asymptotically pivotal tests under Gaussian assumptions, and propose bootstrap methods for approximating the distribution of the test statistics. We probe the finite sample performance through simulations studies, and present an application to log-spectrogram images from a phonetic linguistics dataset.
We propose a novel approach to the analysis of covariance operators making use of concentration inequalities. First, non-asymptotic confidence sets are constructed for such operators. Then, subsequent applications including a k sample test for equality of covariance, a functional data classifier, and an expectation-maximization style clustering algorithm are derived and tested on both simulated and phoneme data.
Causal mediation analysis has historically been limited in two important ways: (i) a focus has traditionally been placed on binary treatments and static interventions, and (ii) direct and indirect effect decompositions have been pursued that are only identifiable in the absence of intermediate confounders affected by treatment. We present a theoretical study of an (in)direct effect decomposition of the population intervention effect, defined by stochastic interventions jointly applied to the treatment and mediators. In contrast to existing proposals, our causal effects can be evaluated regardless of whether a treatment is categorical or continuous and remain well-defined even in the presence of intermediate confounders affected by treatment. Our (in)direct effects are identifiable without a restrictive assumption on cross-world counterfactual independencies, allowing for substantive conclusions drawn from them to be validated in randomized controlled trials. Beyond the novel effects introduced, we provide a careful study of nonparametric efficiency theory relevant for the construction of flexible, multiply robust estimators of our (in)direct effects, while avoiding undue restrictions induced by assuming parametric models of nuisance parameter functionals. To complement our nonparametric estimation strategy, we introduce inferential techniques for constructing confidence intervals and hypothesis tests, and discuss open source software implementing the proposed methodology.
For testing two random vectors for independence, we consider testing whether the distance of one vector from a center point is independent from the distance of the other vector from a center point by a univariate test. In this paper we provide conditions under which it is enough to have a consistent univariate test of independence on the distances to guarantee that the power to detect dependence between the random vectors increases to one, as the sample size increases. These conditions turn out to be minimal. If the univariate test is distribution-free, the multivariate test will also be distribution-free. If we consider multiple center points and aggregate the center-specific univariate tests, the power may be further improved, and the resulting multivariate test may be distribution-free for specific aggregation methods (if the univariate test is distribution-free). We show that several multivariate tests recently proposed in the literature can be viewed as instances of this general approach.
We consider Gaussian measures $mu, tilde{mu}$ on a separable Hilbert space, with fractional-order covariance operators $A^{-2beta}$ resp. $tilde{A}^{-2tilde{beta}}$, and derive necessary and sufficient conditions on $A, tilde{A}$ and $beta, tilde{beta} > 0$ for I. equivalence of the measures $mu$ and $tilde{mu}$, and II. uniform asymptotic optimality of linear predictions for $mu$ based on the misspecified measure $tilde{mu}$. These results hold, e.g., for Gaussian processes on compact metric spaces. As an important special case, we consider the class of generalized Whittle-Matern Gaussian random fields, where $A$ and $tilde{A}$ are elliptic second-order differential operators, formulated on a bounded Euclidean domain $mathcal{D}subsetmathbb{R}^d$ and augmented with homogeneous Dirichlet boundary conditions. Our outcomes explain why the predictive performances of stationary and non-stationary models in spatial statistics often are comparable, and provide a crucial first step in deriving consistency results for parameter estimation of generalized Whittle-Matern fields.
We offer a survey of recent results on covariance estimation for heavy-tailed distributions. By unifying ideas scattered in the literature, we propose user-friendly methods that facilitate practical implementation. Specifically, we introduce element-wise and spectrum-wise truncation operators, as well as their $M$-estimator counterparts, to robustify the sample covariance matrix. Different from the classical notion of robustness that is characterized by the breakdown property, we focus on the tail robustness which is evidenced by the connection between nonasymptotic deviation and confidence level. The key observation is that the estimators needs to adapt to the sample size, dimensionality of the data and the noise level to achieve optimal tradeoff between bias and robustness. Furthermore, to facilitate their practical use, we propose data-driven procedures that automatically calibrate the tuning parameters. We demonstrate their applications to a series of structured models in high dimensions, including the bandable and low-rank covariance matrices and sparse precision matrices. Numerical studies lend strong support to the proposed methods.