Modelling a response as a function of high frequency count data: the association between physical activity and fat mass

234 0 0.0 ( 0 )

Download Cite

Added by Nicole Augustin H

Publication date 2014

fields Mathematical Statistics

and research's language is English

Authors Nicole H. Augustin - Calum Mattocks - Julian J. Faraway

Applications

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

We present a new statistical modelling approach where the response is a function of high frequency count data. Our application is about investigating the relationship between the health outcome fat mass and physical activity (PA) measured by accelerometer. The accelerometer quantifies the intensity of physical activity as counts per epoch over a given period of time. We use data from the Avon longitudinal study of parents and children (ALSPAC) where accelerometer data is available as a time series of accelerometer counts per minute over seven days for a subset of children. In order to compare accelerometer profiles between individuals and to reduce the high dimension a functional summary of the profiles is used. We use the histogram as a functional summary due to its simplicity, suitability and ease of interpretation. Our model is an extension of generalised regression of scalars on functions or signal regression. It allows also multi-dimensional functional predictors and additive non-linear predictors for metric covariates. The additive multidimensional functional predictors allow investigating specific questions about whether the effect of PA varies over its intensity, by gender, by time of day or by day of the week. The key feature of the model is that it utilises the full profile of measured PA without requiring cut-points defining intensity levels for light, moderate and vigorous activity. We show that the (not necessarily causal) effect of PA is not linear and not constant over the activity intensity. Also, there is little evidence to suggest that the effect of PA intensity varies by gender or whether it happens on weekdays or on weekends.

rate research

Generalised Joint Regression for Count Data with a Focus on Modelling Football Matches

96 - Hendrik van der Wurp , Andreas Groll , Thomas Kneib 2019

We propose a versatile joint regression framework for count responses. The method is implemented in the R add-on package GJRM and allows for modelling linear and non-linear dependence through the use of several copulae. Moreover, the parameters of the marginal distributions of the count responses and of the copula can be specified as flexible functions of covariates. Motivated by a football application, we also discuss an extension which forces the regression coefficients of the marginal (linear) predictors to be equal via a suitable penalisation. Model fitting is based on a trust region algorithm which estimates simultaneously all the parameters of the joint models. We investigate the proposals empirical performance in two simulation studies, the first one designed for arbitrary count data, the other one reflecting football-specific settings. Finally, the method is applied to FIFA World Cup data, showing its competitiveness to the standard approach with regard to predictive performance.

Applications Methodology

Scalar on time-by-distribution regression and its application for modelling associations between daily-living physical activity and cognitive functions in Alzheimers Disease

72 - Rahul Ghosal , Vijay R. Varma , Dmitri Volfson 2021

Wearable data is a rich source of information that can provide deeper understanding of links between human behaviours and human health. Existing modelling approaches use wearable data summarized at subject level via scalar summaries using regression techniques, temporal (time-of-day) curves using functional data analysis (FDA), and distributions using distributional data analysis (DDA). We propose to capture temporally local distributional information in wearable data using subject-specific time-by-distribution (TD) data objects. Specifically, we propose scalar on time-by-distribution regression (SOTDR) to model associations between scalar response of interest such as health outcomes or disease status and TD predictors. We show that TD data objects can be parsimoniously represented via a collection of time-varying L-moments that capture distributional changes over the time-of-day. The proposed method is applied to the accelerometry study of mild Alzheimers disease (AD). Mild AD is found to be significantly associated with reduced maximal level of physical activity, particularly during morning hours. It is also demonstrated that TD predictors attain much stronger associations with clinical cognitive scales of attention, verbal memory, and executive function when compared to predictors summarized via scalar total activity counts, temporal functional curves, and quantile functions. Taken together, the present results suggest that the SOTDR analysis provides novel insights into cognitive function and AD.

Applications

Understanding links between water-quality variables and nitrate concentration in freshwater streams using high-frequency sensor data

42 - Claire Kermorvant , Benoit Liquet , Guy Litt 2021

Real time monitoring using in situ sensors is becoming a common approach for measuring water quality within watersheds. High frequency measurements produce big data sets that present opportunities to conduct new analyses for improved understanding of water quality dynamics and more effective management of rivers and streams. Of primary importance is enhancing knowledge of the relationships between nitrate, one of the most reactive forms of inorganic nitrogen in the aquatic environment, and other water quality variables. We analysed high frequency water quality data from in situ sensors deployed in three sites from different watersheds and climate zones within the National Ecological Observatory Network, USA. We used generalised additive mixed models to explain the nonlinear relationships at each site between nitrate concentration and conductivity, turbidity, dissolved oxygen, water temperature, and elevation. Temporal auto correlation was modelled with an auto regressive moving average model and we examined the relative importance of the explanatory variables. Total deviance explained by the models was high for all sites. Although variable importance and the smooth regression parameters differed among sites, the models explaining the most variation in nitrate contained the same explanatory variables. This study demonstrates that building a model for nitrate using the same set of explanatory water quality variables is achievable, even for sites with vastly different environmental and climatic characteristics. Applying such models will assist managers to select cost effective water quality variables to monitor when the goals are to gain a spatially and temporally in depth understanding of nitrate dynamics and adapt management plans accordingly.

Applications

Bayesian GARMA Models for Count Data

150 - Marinho G. Andrade , Ricardo S. Ehlers , Breno S. Andrade 2015

Generalized autoregressive moving average (GARMA) models are a class of models that was developed for extending the univariate Gaussian ARMA time series model to a flexible observation-driven model for non-Gaussian time series data. This work presents Bayesian approach for GARMA models with Poisson, binomial and negative binomial distributions. A simulation study was carried out to investigate the performance of Bayesian estimation and Bayesian model selection criteria. Also three real datasets were analysed using the Bayesian approach on GARMA models.

Applications

Multiscale Analysis of Count Data through Topic Alignment

88 - Julia Fukuyama , Kris Sankaran , Laura Symul 2021

Topic modeling is a popular method used to describe biological count data. With topic models, the user must specify the number of topics $K$. Since there is no definitive way to choose $K$ and since a true value might not exist, we develop techniques to study the relationships across models with different $K$. This can show how many topics are consistently present across different models, if a topic is only transiently present, or if a topic splits in two when $K$ increases. This strategy gives more insight into the process generating the data than choosing a single value of $K$ would. We design a visual representation of these cross-model relationships, which we call a topic alignment, and present three diagnostics based on it. We show the effectiveness of these tools for interpreting the topics on simulated and real data, and we release an accompanying R package, href{https://lasy.github.io/alto}{texttt{alto}}.

Applications Computation