No Arabic abstract
Subjective wellness data can provide important information on the well-being of athletes and be used to maximize player performance and detect and prevent against injury. Wellness data, which are often ordinal and multivariate, include metrics relating to the physical, mental, and emotional status of the athlete. Training and recovery can have significant short- and long-term effects on athlete wellness, and these effects can vary across individual. We develop a joint multivariate latent factor model for ordinal response data to investigate the effects of training and recovery on athlete wellness. We use a latent factor distributed lag model to capture the cumulative effects of training and recovery through time. Current efforts using subjective wellness data have averaged over these metrics to create a univariate summary of wellness, however this approach can mask important information in the data. Our multivariate model leverages each ordinal variable and can be used to identify the relative importance of each in monitoring athlete wellness. The model is applied to athlete daily wellness, training, and recovery data collected across two Major League Soccer seasons.
Methods for addressing missing data have become much more accessible to applied researchers. However, little guidance exists to help researchers systematically identify plausible missing data mechanisms in order to ensure that these methods are appropriately applied. Two considerations motivate the present study. First, psychological research is typically characterized by a large number of potential response variables that may be observed across multiple waves of data collection. This situation makes it more challenging to identify plausible missing data mechanisms than is the case in other fields such as biostatistics where a small number of dependent variables is typically of primary interest and the main predictor of interest is statistically independent of other covariates. Second, there is growing recognition of the importance of systematic approaches to sensitivity analyses for treatment of missing data in psychological science. We develop and apply a systematic approach for reducing a large number of observed patterns and demonstrate how these can be used to explore potential missing data mechanisms within multivariate contexts. A large scale simulation study is used to guide suggestions for which approaches are likely to be most accurate as a function of sample size, number of factors, number of indicators per factor, and proportion of missing data. Three applications of this approach to data examples suggest that the method appears useful in practice.
Exposures to environmental chemicals during gestation can alter health status later in life. Most studies of maternal exposure to chemicals during pregnancy have focused on a single chemical exposure observed at high temporal resolution. Recent research has turned to focus on exposure to mixtures of multiple chemicals, generally observed at a single time point. We consider statistical methods for analyzing data on chemical mixtures that are observed at a high temporal resolution. As motivation, we analyze the association between exposure to four ambient air pollutants observed weekly throughout gestation and birth weight in a Boston-area prospective birth cohort. To explore patterns in the data, we first apply methods for analyzing data on (1) a single chemical observed at high temporal resolution, and (2) a mixture measured at a single point in time. We highlight the shortcomings of these approaches for temporally-resolved data on exposure to chemical mixtures. Second, we propose a novel method, a Bayesian kernel machine regression distributed lag model (BKMR-DLM), that simultaneously accounts for nonlinear associations and interactions among time-varying measures of exposure to mixtures. BKMR-DLM uses a functional weight for each exposure that parameterizes the window of susceptibility corresponding to that exposure within a kernel machine framework that captures non-linear and interaction effects of the multivariate exposure on the outcome. In a simulation study, we show that the proposed method can better estimate the exposure-response function and, in high signal settings, can identify critical windows in time during which exposure has an increased association with the outcome. Applying the proposed method to the Boston birth cohort data, we find evidence of a negative association between organic carbon and birth weight and that nitrate modifies the organic carbon, ...
Mass cytometry technology enables the simultaneous measurement of over 40 proteins on single cells. This has helped immunologists to increase their understanding of heterogeneity, complexity, and lineage relationships of white blood cells. Current statistical methods often collapse the rich single-cell data into summary statistics before proceeding with downstream analysis, discarding the information in these multivariate datasets. In this article, our aim is to exhibit the use of statistical analyses on the raw, uncompressed data thus improving replicability, and exposing multivariate patterns and their associated uncertainty profiles. We show that multivariate generative models are a valid alternative to univariate hypothesis testing. We propose two models: a multivariate Poisson log-normal mixed model and a logistic linear mixed model. We show that these models are complementary and that either model can account for different confounders. We use Hamiltonian Monte Carlo to provide Bayesian uncertainty quantification. Our models applied to a recent pregnancy study successfully reproduce key findings while quantifying increased overall protein-to-protein correlations between first and third trimester.
Microorganisms play critical roles in human health and disease. It is well known that microbes live in diverse communities in which they interact synergistically or antagonistically. Thus for estimating microbial associations with clinical covariates, multivariate statistical models are preferred. Multivariate models allow one to estimate and exploit complex interdependencies among multiple taxa, yielding more powerful tests of exposure or treatment effects than application of taxon-specific univariate analyses. In addition, the analysis of microbial count data requires special attention because data commonly exhibit zero inflation. To meet these needs, we developed a Bayesian variable selection model for multivariate count data with excess zeros that incorporates information on the covariance structure of the outcomes (counts for multiple taxa), while estimating associations with the mean levels of these outcomes. Although there has been a great deal of effort in zero-inflated models for longitudinal data, little attention has been given to high-dimensional multivariate zero-inflated data modeled via a general correlation structure. Through simulation, we compared performance of the proposed method to that of existing univariate approaches, for both the binary and count parts of the model. When outcomes were correlated the proposed variable selection method maintained type I error while boosting the ability to identify true associations in the binary component of the model. For the count part of the model, in some scenarios the the univariate method had higher power than the multivariate approach. This higher power was at a cost of a highly inflated false discovery rate not observed with the proposed multivariate method. We applied the approach to oral microbiome data from the Pediatric HIV/AIDS Cohort Oral Health Study and identified five species (of 44) associated with HIV infection.
Factor analysis is a flexible technique for assessment of multivariate dependence and codependence. Besides being an exploratory tool used to reduce the dimensionality of multivariate data, it allows estimation of common factors that often have an interesting theoretical interpretation in real problems. However, standard factor analysis is only applicable when the variables are scaled, which is often inappropriate, for example, in data obtained from questionnaires in the field of psychology,where the variables are often categorical. In this framework, we propose a factor model for the analysis of multivariate ordered and non-ordered polychotomous data. The inference procedure is done under the Bayesian approach via Markov chain Monte Carlo methods. Two Monte-Carlo simulation studies are presented to investigate the performance of this approach in terms of estimation bias, precision and assessment of the number of factors. We also illustrate the proposed method to analyze participants responses to the Motivational State Questionnaire dataset, developed to study emotions in laboratory and field settings.