No Arabic abstract
Methods for addressing missing data have become much more accessible to applied researchers. However, little guidance exists to help researchers systematically identify plausible missing data mechanisms in order to ensure that these methods are appropriately applied. Two considerations motivate the present study. First, psychological research is typically characterized by a large number of potential response variables that may be observed across multiple waves of data collection. This situation makes it more challenging to identify plausible missing data mechanisms than is the case in other fields such as biostatistics where a small number of dependent variables is typically of primary interest and the main predictor of interest is statistically independent of other covariates. Second, there is growing recognition of the importance of systematic approaches to sensitivity analyses for treatment of missing data in psychological science. We develop and apply a systematic approach for reducing a large number of observed patterns and demonstrate how these can be used to explore potential missing data mechanisms within multivariate contexts. A large scale simulation study is used to guide suggestions for which approaches are likely to be most accurate as a function of sample size, number of factors, number of indicators per factor, and proportion of missing data. Three applications of this approach to data examples suggest that the method appears useful in practice.
There is strong interest among payers to identify emerging healthcare cost drivers to support early intervention. However, many challenges arise in analyzing large, high dimensional, and noisy healthcare data. In this paper, we propose a systematic approach that utilizes hierarchical and multi-resolution search strategies using enhanced statistical process control (SPC) algorithms to surface high impact cost drivers. Our approach aims to provide interpretable, detailed, and actionable insights of detected change patterns attributing to multiple demographic and clinical factors. We also proposed an algorithm to identify comparable treatment offsets at the population level and quantify the cost impact on their utilization changes.
Subjective wellness data can provide important information on the well-being of athletes and be used to maximize player performance and detect and prevent against injury. Wellness data, which are often ordinal and multivariate, include metrics relating to the physical, mental, and emotional status of the athlete. Training and recovery can have significant short- and long-term effects on athlete wellness, and these effects can vary across individual. We develop a joint multivariate latent factor model for ordinal response data to investigate the effects of training and recovery on athlete wellness. We use a latent factor distributed lag model to capture the cumulative effects of training and recovery through time. Current efforts using subjective wellness data have averaged over these metrics to create a univariate summary of wellness, however this approach can mask important information in the data. Our multivariate model leverages each ordinal variable and can be used to identify the relative importance of each in monitoring athlete wellness. The model is applied to athlete daily wellness, training, and recovery data collected across two Major League Soccer seasons.
The broad concept of emergence is instrumental in various of the most challenging open scientific questions -- yet, few quantitative theories of what constitutes emergent phenomena have been proposed. This article introduces a formal theory of causal emergence in multivariate systems, which studies the relationship between the dynamics of parts of a system and macroscopic features of interest. Our theory provides a quantitative definition of downward causation, and introduces a complementary modality of emergent behaviour -- which we refer to as causal decoupling. Moreover, the theory allows practical criteria that can be efficiently calculated in large systems, making our framework applicable in a range of scenarios of practical interest. We illustrate our findings in a number of case studies, including Conways Game of Life, Reynolds flocking model, and neural activity as measured by electrocorticography.
There is strong interest among healthcare payers to identify emerging healthcare cost drivers to support early intervention. However, many challenges arise in analyzing large, high dimensional, and noisy healthcare data. In this paper, we propose a systematic approach that utilizes hierarchical search strategies and enhanced statistical process control (SPC) algorithms to surface high impact cost drivers. Our approach aims to provide interpretable, detailed, and actionable insights of detected change patterns attributing to multiple clinical factors. We also proposed an algorithm to identify comparable treatment offsets at the population level and quantify the cost impact on their utilization changes. To illustrate our approach, we apply it to the IBM Watson Health MarketScan Commercial Database and organized the detected emerging drivers into 5 categories for reporting. We also discuss some findings in this analysis and potential actions in mitigating the impact of the drivers.
A severe case of scientific misconduct was discovered in a paper from 2005 allegedly showing harmful effects (DNA breakage) of non-thermal mobile phone electromagnetic field exposure on human and rat cells. Here we describe the way how the fraudulent data were identified. The low variations of the reported biological data are shown to be below theoretical lower limits (multinomial distributions). Another reason for doubts was highly significant non-equal distributions of last digits, a known hint towards data fabrication. The Medical University Vienna, where the research was conducted, was informed about these findings and came to the conclusion that the data in this and another, related paper by the same group were fabricated, and that both papers should be retracted.