No Arabic abstract
We investigate the causal effects of drug exposure on birth defects, motivated by a recent cohort study of birth outcomes in pregnancies of women treated with a given medication, that revealed a higher rate of major structural birth defects in infants born to exposed versus unexposed women. An outstanding problem in this study was the missing birth defect outcomes among pregnancy losses resulting from spontaneous abortion. This led to missing not at random because, according to the theory of terathanasia, a defected fetus is more likely to be spontaneously aborted. In addition, the previous analysis stratified on live birth against spontaneous abortion, which was itself a post-exposure variable and hence did not lead to causal interpretation of the stratified results. In this paper we aimed to estimate and provide inference for the causal parameters of scientific interest, including the principal effects, making use of the missing data mechanism informed by terathanasia. During this process we also dealt with complications in the data including left truncation, observational nature, and rare events. We report our findings which shed light on how studies on causal effects of medication or other exposures during pregnancy may be analyzed.
The sex ratio at birth (SRB) in India has been reported imbalanced since the 1970s. Previous studies have shown a great variation in the SRB across geographic locations in India till 2016. As one of the most populous countries and in view of its great regional heterogeneity, it is crucial to produce probabilistic projections for the SRB in India at state level for the purpose of population projection and policy planning. In this paper, we implement a Bayesian hierarchical time series model to project SRB in India by state. We generate SRB probabilistic projections from 2017 to 2030 for 29 States and Union Territories (UTs) in India, and present results in 21 States/UTs with data from the Sample Registration System. Our analysis takes into account two state-specific factors that contribute to sex-selective abortion and resulting sex imbalances at birth: intensity of son preference and fertility squeeze. We project that the largest contribution to female births deficits is in Uttar Pradesh, with cumulative number of missing female births projected to be 2.0 (95% credible interval [1.9; 2.2]) million from 2017 to 2030. The total female birth deficits during 2017-2030 for the whole India is projected to be 6.8 [6.6; 7.0] million.
Our work was motivated by a recent study on birth defects of infants born to pregnant women exposed to a certain medication for treating chronic diseases. Outcomes such as birth defects are rare events in the general population, which often translate to very small numbers of events in the unexposed group. As drug safety studies in pregnancy are typically observational in nature, we control for confounding in this rare events setting using propensity scores (PS). Using our empirical data, we noticed that the estimated odds ratio for birth defects due to exposure varied drastically depending on the specific approach used. The commonly used approaches with PS are matching, stratification, inverse probability weighting (IPW) and regression adjustment. The extremely rare events setting renders the matching or stratification infeasible. In addition, the PS itself may be formed via different approaches to select confounders from a relatively long list of potential confounders. We carried out simulation experiments to compare different combinations of approaches: IPW or regression adjustment, with 1) including all potential confounders without selection, 2) selection based on univariate association between the candidate variable and the outcome, 3) selection based on change in effects (CIE). The simulation showed that IPW without selection leads to extremely large variances in the estimated odds ratio, which help to explain the empirical data analysis results that we had observed. The simulation also showed that IPW with selection based on univariate association with the outcome is preferred over IPW with CIE. Regression adjustment has small variances of the estimated odds ratio regardless of the selection methods used.
Since December 2019, the world has been witnessing the gigantic effect of an unprecedented global pandemic called Severe Acute Respiratory Syndrome Coronavirus (SARS-CoV-2) - COVID-19. So far, 38,619,674 confirmed cases and 1,093,522 confirmed deaths due to COVID-19 have been reported. In the United States (US), the cases and deaths are recorded as 7,833,851 and 215,199. Several timely researches have discussed the local and global effects of the confounding factors on COVID-19 casualties in the US. However, most of these studies considered little about the time varying associations between and among these factors, which are crucial for understanding the outbreak of the present pandemic. Therefore, this study adopts various relevant approaches, including local and global spatial regression models and machine learning to explore the causal effects of the confounding factors on COVID-19 counts in the contiguous US. Totally five spatial regression models, spatial lag model (SLM), ordinary least square (OLS), spatial error model (SEM), geographically weighted regression (GWR) and multiscale geographically weighted regression (MGWR), are performed at the county scale to take into account the scale effects on modelling. For COVID-19 cases, ethnicity, crime, and income factors are found to be the strongest covariates and explain the maximum model variances. For COVID-19 deaths, both (domestic and international) migration and income factors play a crucial role in explaining spatial differences of COVID-19 death counts across counties. The local coefficient of determination (R2) values derived from the GWR and MGWR models are found very high over the Wisconsin-Indiana-Michigan (the Great Lake) region, as well as several parts of Texas, California, Mississippi and Arkansas.
We develop a distribution-free, unsupervised anomaly detection method called ECAD, which wraps around any regression algorithm and sequentially detects anomalies. Rooted in conformal prediction, ECAD does not require data exchangeability but approximately controls the Type-I error when data are normal. Computationally, it involves no data-splitting and efficiently trains ensemble predictors to increase statistical power. We demonstrate the superior performance of ECAD on detecting anomalous spatio-temporal traffic flow.
Patients with Acute Kidney Injury (AKI) increase mortality, morbidity, and long-term adverse events. Therefore, early identification of AKI may improve renal function recovery, decrease comorbidities, and further improve patients survival. To control certain risk factors and develop targeted prevention strategies are important to reduce the risk of AKI. Drug-drug interactions and drug-disease interactions are critical issues for AKI. Typical statistical approaches cannot handle the complexity of drug-drug and drug-disease interactions. In this paper, we propose a novel learning algorithm, Deep Rule Forests (DRF), which discovers rules from multilayer tree models as the combinations of drug usages and disease indications to help identify such interactions. We found that several disease and drug usages are considered having significant impact on the occurrence of AKI. Our experimental results also show that the DRF model performs comparatively better than typical tree-based and other state-of-the-art algorithms in terms of prediction accuracy and model interpretability.