No Arabic abstract
Hierarchical inference in (generalized) regression problems is powerful for finding significant groups or even single covariates, especially in high-dimensional settings where identifiability of the entire regression parameter vector may be ill-posed. The general method proceeds in a fully data-driven and adaptive way from large to small groups or singletons of covariates, depending on the signal strength and the correlation structure of the design matrix. We propose a novel hierarchical multiple testing adjustment that can be used in combination with any significance test for a group of covariates to perform hierarchical inference. Our adjustment passes on the significance level of certain hypotheses that could not be rejected and is shown to guarantee strong control of the familywise error rate. Our method is at least as powerful as a so-called depth-wise hierarchical Bonferroni adjustment. It provides a substantial gain in power over other previously proposed inheritance hierarchical procedures if the underlying alternative hypotheses occur sparsely along a few branches in the tree-structured hierarchy.
The Consent-to-Contact (C2C) registry at the University of California, Irvine collects data from community participants to aid in the recruitment to clinical research studies. Self-selection into the C2C likely leads to bias due in part to enrollees having more years of education relative to the US general population. Salazar et al. (2020) recently used the C2C to examine associations of race/ethnicity with participant willingness to be contacted about research studies. To address questions about generalizability of estimated associations we estimate propensity for self-selection into the convenience sample weights using data from the National Health and Nutrition Examination Survey (NHANES). We create a combined dataset of C2C and NHANES subjects and compare different approaches (logistic regression, covariate balancing propensity score, entropy balancing, and random forest) for estimating the probability of membership in C2C relative to NHANES. We propose methods to estimate the variance of parameter estimates that account for uncertainty that arises from estimating propensity weights. Simulation studies explore the impact of propensity weight estimation on uncertainty. We demonstrate the approach by repeating the analysis by Salazar et al. with the deduced propensity weights for the C2C subjects and contrast the results of the two analyses. This method can be implemented using our estweight package in R available on GitHub.
High-dimensional group inference is an essential part of statistical methods for analysing complex data sets, including hierarchical testing, tests of interaction, detection of heterogeneous treatment effects and inference for local heritability. Group inference in regression models can be measured with respect to a weighted quadratic functional of the regression sub-vector corresponding to the group. Asymptotically unbiased estimators of these weighted quadratic functionals are constructed and a novel procedure using these estimators for inference is proposed. We derive its asymptotic Gaussian distribution which enables the construction of asymptotically valid confidence intervals and tests which perform well in terms of length or power. The proposed test is computationally efficient even for a large group, statistically valid for any group size and achieving good power performance for testing large groups with many small regression coefficients. We apply the methodology to several interesting statistical problems and demonstrate its strength and usefulness on simulated and real data.
Analyses of environmental phenomena often are concerned with understanding unlikely events such as floods, heatwaves, droughts or high concentrations of pollutants. Yet the majority of the causal inference literature has focused on modelling means, rather than (possibly high) quantiles. We define a general estimator of the population quantile treatment (or exposure) effects (QTE) -- the weighted QTE (WQTE) -- of which the population QTE is a special case, along with a general class of balancing weights incorporating the propensity score. Asymptotic properties of the proposed WQTE estimators are derived. We further propose and compare propensity score regression and two weighted methods based on these balancing weights to understand the causal effect of an exposure on quantiles, allowing for the exposure to be binary, discrete or continuous. Finite sample behavior of the three estimators is studied in simulation. The proposed methods are applied to data taken from the Bavarian Danube catchment area to estimate the 95% QTE of phosphorus on copper concentration in the river.
Models defined by stochastic differential equations (SDEs) allow for the representation of random variability in dynamical systems. The relevance of this class of models is growing in many applied research areas and is already a standard tool to model e.g. financial, neuronal and population growth dynamics. However inference for multidimensional SDE models is still very challenging, both computationally and theoretically. Approximate Bayesian computation (ABC) allow to perform Bayesian inference for models which are sufficiently complex that the likelihood function is either analytically unavailable or computationally prohibitive to evaluate. A computationally efficient ABC-MCMC algorithm is proposed, halving the running time in our simulations. Focus is on the case where the SDE describes latent dynamics in state-space models; however the methodology is not limited to the state-space framework. Simulation studies for a pharmacokinetics/pharmacodynamics model and for stochastic chemical reactions are considered and a MATLAB package implementing our ABC-MCMC algorithm is provided.
We propose a framework for Bayesian non-parametric estimation of the rate at which new infections occur assuming that the epidemic is partially observed. The developed methodology relies on modelling the rate at which new infections occur as a function which only depends on time. Two different types of prior distributions are proposed namely using step-functions and B-splines. The methodology is illustrated using both simulated and real datasets and we show that certain aspects of the epidemic such as seasonality and super-spreading events are picked up without having to explicitly incorporate them into a parametric model.