No Arabic abstract
We propose a method to test for the presence of differential ascertainment in case-control studies, when data are collected by multiple sources. We show that, when differential ascertainment is present, the use of only the observed cases leads to severe bias in the computation of the odds ratio. We can alleviate the effect of such bias using the estimates that our method of testing for differential ascertainment naturally provides. We apply it to a dataset obtained from the National Violent Death Reporting System, with the goal of checking for the presence of differential ascertainment by race in the count of deaths caused by child maltreatment.
Identifying which taxa in our microbiota are associated with traits of interest is important for advancing science and health. However, the identification is challenging because the measured vector of taxa counts (by amplicon sequencing) is compositi
Recent work in fair machine learning has proposed dozens of technical definitions of algorithmic fairness and methods for enforcing these definitions. However, we still lack an understanding of how to develop machine learning systems with fairness criteria that reflect relevant stakeholders nuanced viewpoints in real-world contexts. To address this gap, we propose a framework for eliciting stakeholders subjective fairness notions. Combining a user interface that allows stakeholders to examine the data and the algorithms predictions with an interview protocol to probe stakeholders thoughts while they are interacting with the interface, we can identify stakeholders fairness beliefs and principles. We conduct a user study to evaluate our framework in the setting of a child maltreatment predictive system. Our evaluations show that the framework allows stakeholders to comprehensively convey their fairness viewpoints. We also discuss how our results can inform the design of predictive systems.
Knockoffs provide a general framework for controlling the false discovery rate when performing variable selection. Much of the Knockoffs literature focuses on theoretical challenges and we recognize a need for bringing some of the current ideas into practice. In this paper we propose a sequential algorithm for generating knockoffs when underlying data consists of both continuous and categorical (factor) variables. Further, we present a heuristic multiple knockoffs approach that offers a practical assessment of how robust the knockoff selection process is for a given data set. We conduct extensive simulations to validate performance of the proposed methodology. Finally, we demonstrate the utility of the methods on a large clinical data pool of more than $2,000$ patients with psoriatic arthritis evaluated in 4 clinical trials with an IL-17A inhibitor, secukinumab (Cosentyx), where we determine prognostic factors of a well established clinical outcome. The analyses presented in this paper could provide a wide range of applications to commonly encountered data sets in medical practice and other fields where variable selection is of particular interest.
Large renewable energy projects, such as large offshore wind farms, are critical to achieving low-emission targets set by governments. Stochastic computer models allow us to explore future scenarios to aid decision making whilst considering the most relevant uncertainties. Complex stochastic computer models can be prohibitively slow and thus an emulator may be constructed and deployed to allow for efficient computation. We present a novel heteroscedastic Gaussian Process emulator which exploits cheap approximations to a stochastic offshore wind farm simulator. We conduct a probabilistic sensitivity analysis to understand the influence of key parameters in the wind farm simulator which will help us to plan a probability elicitation in the future.
In this paper, a Bayesian semiparametric copula approach is used to model the underlying multivariate distribution $F_{true}$. First, the Dirichlet process is constructed on the unknown marginal distributions of $F_{true}$. Then a Gaussian copula model is utilized to capture the dependence structure of $F_{true}$. As a result, a Bayesian multivariate normality test is developed by combining the relative belief ratio and the Energy distance. Several interesting theoretical results of the approach are derived. Finally, through several simulated examples and a real data set, the proposed approach reveals excellent performance.