ﻻ يوجد ملخص باللغة العربية
We describe a data-driven discovery method that leverages Simpsons paradox to uncover interesting patterns in behavioral data. Our method systematically disaggregates data to identify subgroups within a population whose behavior deviates significantly from the rest of the population. Given an outcome of interest and a set of covariates, the method follows three steps. First, it disaggregates data into subgroups, by conditioning on a particular covariate, so as minimize the variation of the outcome within the subgroups. Next, it models the outcome as a linear function of another covariate, both in the subgroups and in the aggregate data. Finally, it compares trends to identify disaggregations that produce subgroups with different behaviors from the aggregate. We illustrate the method by applying it to three real-world behavioral datasets, including Q&A site Stack Exchange and online learning platforms Khan Academy and Duolingo.
Identifying the factors that influence academic performance is an essential part of educational research. Previous studies have documented the importance of personality traits, class attendance, and social network structure. Because most of these ana
Simpsons paradox, or Yule-Simpson effect, arises when a trend appears in different subsets of data but disappears or reverses when these subsets are combined. We describe here seven cases of this phenomenon for chemo-kinematical relations believed to
We investigate how Simpsons paradox affects analysis of trends in social data. According to the paradox, the trends observed in data that has been aggregated over an entire population may be different from, and even opposite to, those of the underlyi
Recommendation systems are often evaluated based on users interactions that were collected from an existing, already deployed recommendation system. In this situation, users only provide feedback on the exposed items and they may not leave feedback o
We consider the Jeffreys-Lindley paradox from an objective Bayesian perspective by attempting to find priors representing complete indifference to sample size in the problem. This means that we ensure that the prior for the unknown mean and the prior