No Arabic abstract
In repeated measures factorial designs involving clustered units, parametric methods such as linear mixed effects models are used to handle within subject correlations. However, assumptions of these parametric models such as continuity and normality are usually hard to come by in many cases. The homoscedasticity assumption is rather hard to verify in practice. Furthermore, these assumptions may not even be realistic when data are measured in a non-metric scale as commonly happens, for example, in Quality of Life outcomes. In this article, nonparametric effect-size measures for clustered data in factorial designs with pre-post measurements will be introduced. The effect-size measures provide intuitively-interpretable and informative probabilistic comparisons of treatment and time effects. The dependence among observations within a cluster can be arbitrary across treatment groups. The effect-size estimators along with their asymptotic properties for computing confidence intervals and performing hypothesis tests will be discussed. ANOVA-type statistics with $chi^2$ approximation that retain some of the optimal asymptotic behaviors in small samples are investigated. Within each treatment group, we allow some clusters to involve observations measured on both pre and post intervention periods (referred to as complete clusters), while others to contain observations from either pre or post intervention period only (referred to as incomplete clusters). Our methods are shown to be, particularly effective in the presence of multiple forms of clustering. The developed nonparametric methods are illustrated with data from a three-arm Randomized Trial of Indoor Wood Smoke reduction. The study considered two active treatments to improve asthma symptoms of kids living in homes that use wood stove for heating.
This article concerns a class of generalized linear mixed models for clustered data, where the random effects are mapped uniquely onto the grouping structure and are independent between groups. We derive necessary and sufficient conditions that enable the marginal likelihood of such class of models to be expressed in closed-form. Illustrations are provided using the Gaussian, Poisson, binomial and gamma distributions. These models are unified under a single umbrella of conjugate generalized linear mixed models, where conjugate refers to the fact that the marginal likelihood can be expressed in closed-form, rather than implying inference via the Bayesian paradigm. Having an explicit marginal likelihood means that these models are more computationally convenient, which can be important in big data contexts. Except for the binomial distribution, these models are able to achieve simultaneous conjugacy, and thus able to accommodate both unit and group level covariates.
We develop Bayesian nonparametric models for spatially indexed data of mixed type. Our work is motivated by challenges that occur in environmental epidemiology, where the usual presence of several confounding variables that exhibit complex interactions and high correlations makes it difficult to estimate and understand the effects of risk factors on health outcomes of interest. The modeling approach we adopt assumes that responses and confounding variables are manifestations of continuous latent variables, and uses multivariate Gaussians to jointly model these. Responses and confounding variables are not treated equally as relevant parameters of the distributions of the responses only are modeled in terms of explanatory variables or risk factors. Spatial dependence is introduced by allowing the weights of the nonparametric process priors to be location specific, obtained as probit transformations of Gaussian Markov random fields. Confounding variables and spatial configuration have a similar role in the model, in that they only influence, along with the responses, the allocation probabilities of the areas into the mixture components, thereby allowing for flexible adjustment of the effects of observed confounders, while allowing for the possibility of residual spatial structure, possibly occurring due to unmeasured or undiscovered spatially varying factors. Aspects of the model are illustrated in simulation studies and an application to a real data set.
In spatial statistics, it is often assumed that the spatial field of interest is stationary and its covariance has a simple parametric form, but these assumptions are not appropriate in many applications. Given replicate observations of a Gaussian spatial field, we propose nonstationary and nonparametric Bayesian inference on the spatial dependence. Instead of estimating the quadratic (in the number of spatial locations) entries of the covariance matrix, the idea is to infer a near-linear number of nonzero entries in a sparse Cholesky factor of the precision matrix. Our prior assumptions are motivated by recent results on the exponential decay of the entries of this Cholesky factor for Matern-type covariances under a specific ordering scheme. Our methods are highly scalable and parallelizable. We conduct numerical comparisons and apply our methodology to climate-model output, enabling statistical emulation of an expensive physical model.
The simultaneous estimation of many parameters $eta_i$, based on a corresponding set of observations $x_i$, for $i=1,ldots, n$, is a key research problem that has received renewed attention in the high-dimensional setting. %The classic example involves estimating a vector of normal means $mu_i$ subject to a fixed variance term $sigma^2$. However, Many practical situations involve heterogeneous data $(x_i, theta_i)$ where $theta_i$ is a known nuisance parameter. Effectively pooling information across samples while correctly accounting for heterogeneity presents a significant challenge in large-scale estimation problems. We address this issue by introducing the Nonparametric Empirical Bayes Smoothing Tweedie (NEST) estimator, which efficiently estimates $eta_i$ and properly adjusts for heterogeneity %by approximating the marginal density of the data $f_{theta_i}(x_i)$ and applying this density to via a generalized version of Tweedies formula. NEST is capable of handling a wider range of settings than previously proposed heterogeneous approaches as it does not make any parametric assumptions on the prior distribution of $eta_i$. The estimation framework is simple but general enough to accommodate any member of the exponential family of distributions. %; a thorough study of the normal means problem subject to heterogeneous variances is presented to illustrate the proposed framework. Our theoretical results show that NEST is asymptotically optimal, while simulation studies show that it outperforms competing methods, with substantial efficiency gains in many settings. The method is demonstrated on a data set measuring the performance gap in math scores between socioeconomically advantaged and disadvantaged students in K-12 schools.
We give an expository review of applications of computational algebraic statistics to design and analysis of fractional factorial experiments based on our recent works. For the purpose of design, the techniques of Grobner bases and indicator functions allow us to treat fractional factorial designs without distinction between regular designs and non-regular designs. For the purpose of analysis of data from fractional factorial designs, the techniques of Markov bases allow us to handle discrete observations. Thus the approach of computational algebraic statistics greatly enlarges the scope of fractional factorial designs.