No Arabic abstract
Mediation analyses are a statistical tool for testing the hypothesis about how the relationship between two variables may be direct or indirect via a third variable. Assessing statistical significance has been an area of active research; however, assessment of statistical power has been hampered by the lack of closed form calculations and the need for substantial amounts of computational simulations. The current work provides a detailed explanation of implementing large scale simulation procedures within a shared computing cluster environment. In addition, all results and code for implementing these procedures is publicly available. The resulting power analyses compare the effects of sample size and strength and direction of the relationships between the three variables. Comparisons of three confidence interval calculation methods demonstrated that the bias-corrected method is optimal and requires approximately ten less participants than the percentile method to achieve equivalent power. Differing strengths of distal and proximal effects were compared and did not differentially affect the power to detect mediation effects. Suppression effects were explored and demonstrate that in the presence of no observed relationship between two variables, entrance of the mediating variable into the model can reveal a suppressed relationship. The power to detect suppression effects is similar to unsuppressed mediation. These results and their methods provide important information about the power of mediation models for study planning. Of greater importance is that the methods lay the groundwork for assessment of statistical power of more complicated models involving multiple mediators and moderators.
The impracticality of posterior sampling has prevented the widespread adoption of spike-and-slab priors in high-dimensional applications. To alleviate the computational burden, optimization strategies have been proposed that quickly find local posterior modes. Trading off uncertainty quantification for computational speed, these strategies have enabled spike-and-slab deployments at scales that would be previously unfeasible. We build on one recent development in this strand of work: the Spike-and-Slab LASSO procedure of Rov{c}kov{a} and George (2018). Instead of optimization, however, we explore multiple avenues for posterior sampling, some traditional and some new. Intrigued by the speed of Spike-and-Slab LASSO mode detection, we explore the possibility of sampling from an approximate posterior by performing MAP optimization on many independently perturbed datasets. To this end, we explore Bayesian bootstrap ideas and introduce a new class of jittered Spike-and-Slab LASSO priors with random shrinkage targets. These priors are a key constituent of the Bayesian Bootstrap Spike-and-Slab LASSO (BB-SSL) method proposed here. BB-SSL turns fast optimization into approximate posterior sampling. Beyond its scalability, we show that BB-SSL has a strong theoretical support. Indeed, we find that the induced pseudo-posteriors contract around the truth at a near-optimal rate in sparse normal-means and in high-dimensional regression. We compare our algorithm to the traditional Stochastic Search Variable Selection (under Laplace priors) as well as many state-of-the-art methods for shrinkage priors. We show, both in simulations and on real data, that our method fares superbly in these comparisons, often providing substantial computational gains.
We are concerned with nonparametric hypothesis testing of time series functionals. It is known that the popular autoregressive sieve bootstrap is, in general, not valid for statistics whose (asymptotic) distribution depends on moments of order higher than two, irrespective of whether the data come from a linear time series or a nonlinear one. Inspired by nonlinear system theory we circumvent this non-validity by introducing a higher-order bootstrap scheme based on the Volterra series representation of the process. In order to estimate coefficients of such a representation efficiently, we rely on the alternative formulation of Volterra operators in reproducing kernel Hilbert space. We perform polynomial kernel regression which scales linearly with the input dimensionality and is independent of the degree of nonlinearity. We illustrate the applicability of the suggested Volterra-representation-based bootstrap procedure in a simulation study where we consider strictly stationary linear and nonlinear processes.
Causal mediation analysis has historically been limited in two important ways: (i) a focus has traditionally been placed on binary treatments and static interventions, and (ii) direct and indirect effect decompositions have been pursued that are only identifiable in the absence of intermediate confounders affected by treatment. We present a theoretical study of an (in)direct effect decomposition of the population intervention effect, defined by stochastic interventions jointly applied to the treatment and mediators. In contrast to existing proposals, our causal effects can be evaluated regardless of whether a treatment is categorical or continuous and remain well-defined even in the presence of intermediate confounders affected by treatment. Our (in)direct effects are identifiable without a restrictive assumption on cross-world counterfactual independencies, allowing for substantive conclusions drawn from them to be validated in randomized controlled trials. Beyond the novel effects introduced, we provide a careful study of nonparametric efficiency theory relevant for the construction of flexible, multiply robust estimators of our (in)direct effects, while avoiding undue restrictions induced by assuming parametric models of nuisance parameter functionals. To complement our nonparametric estimation strategy, we introduce inferential techniques for constructing confidence intervals and hypothesis tests, and discuss open source software implementing the proposed methodology.
This paper proposes a new two-stage network mediation method based on the use of a latent network approach -- model-based eigenvalue decomposition -- for analyzing social network data with nodal covariates. In the decomposition stage of the observed network, no assumption on the metric of the latent space structure is required. In the mediation stage, the most important eigenvectors of a network are used as mediators. This method further offers an innovative way for controlling for the conditional covariates and it only considers the information left in the network. We demonstrate this approach in a detailed tutorial R code provided for four separate cases -- unconditional and conditional model-based eigenvalue decompositions for either a continuous outcome or a binary outcome -- to show its applicability to empirical network data.
Clustering methods have led to a number of important discoveries in bioinformatics and beyond. A major challenge in their use is determining which clusters represent important underlying structure, as opposed to spurious sampling artifacts. This challenge is especially serious, and very few methods are available when the data are very high in dimension. Statistical Significance of Clustering (SigClust) is a recently developed cluster evaluation tool for high dimensional low sample size data. An important component of the SigClust approach is the very definition of a single cluster as a subset of data sampled from a multivariate Gaussian distribution. The implementation of SigClust requires the estimation of the eigenvalues of the covariance matrix for the null multivariate Gaussian distribution. We show that the original eigenvalue estimation can lead to a test that suffers from severe inflation of type-I error, in the important case where there are huge single spikes in the eigenvalues. This paper addresses this critical challenge using a novel likelihood based soft thresholding approach to estimate these eigenvalues which leads to a much improved SigClust. These major improvements in SigClust performance are shown by both theoretical work and an extensive simulation study. Applications to some cancer genomic data further demonstrate the usefulness of these improvements.