Can you Trust the Trend: Discovering Simpsons Paradoxes in Social Data

70 0 0.0 ( 0 )

Download Cite

Added by Kristina Lerman

Publication date 2018

fields Informatics Engineering

and research's language is English

Authors Nazanin Alipourfard - Peter G. Fennell - Kristina Lerman

Computers and Society

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

We investigate how Simpsons paradox affects analysis of trends in social data. According to the paradox, the trends observed in data that has been aggregated over an entire population may be different from, and even opposite to, those of the underlying subgroups. Failure to take this effect into account can lead analysis to wrong conclusions. We present a statistical method to automatically identify Simpsons paradox in data by comparing statistical trends in the aggregate data to those in the disaggregated subgroups. We apply the approach to data from Stack Exchange, a popular question-answering platform, to analyze factors affecting answerer performance, specifically, the likelihood that an answer written by a user will be accepted by the asker as the best answer to his or her question. Our analysis confirms a known Simpsons paradox and identifies several new instances. These paradoxes provide novel insights into user behavior on Stack Exchange.

rate research

Using Simpsons Paradox to Discover Interesting Patterns in Behavioral Data

140 - Nazanin Alipourfard , Peter G. Fennell , Kristina Lerman 2018

We describe a data-driven discovery method that leverages Simpsons paradox to uncover interesting patterns in behavioral data. Our method systematically disaggregates data to identify subgroups within a population whose behavior deviates significantly from the rest of the population. Given an outcome of interest and a set of covariates, the method follows three steps. First, it disaggregates data into subgroups, by conditioning on a particular covariate, so as minimize the variation of the outcome within the subgroups. Next, it models the outcome as a linear function of another covariate, both in the subgroups and in the aggregate data. Finally, it compares trends to identify disaggregations that produce subgroups with different behaviors from the aggregate. We illustrate the method by applying it to three real-world behavioral datasets, including Q&A site Stack Exchange and online learning platforms Khan Academy and Duolingo.

Computers and Society Artificial Intelligence Data Analysis Statistics and Probability

Sentiment Paradoxes in Social Networks: Why Your Friends Are More Positive Than You?

428 - Xinyi Zhou , Shengmin Jin , Reza Zafarani 2020

Most people consider their friends to be more positive than themselves, exhibiting a Sentiment Paradox. Psychology research attributes this paradox to human cognition bias. With the goal to understand this phenomenon, we study sentiment paradoxes in social networks. Our work shows that social connections (friends, followees, or followers) of users are indeed (not just illusively) more positive than the users themselves. This is mostly due to positive users having more friends. We identify five sentiment paradoxes at different network levels ranging from triads to large-scale communities. Empirical and theoretical evidence are provided to validate the existence of such sentiment paradoxes. By investigating the relationships between the sentiment paradox and other well-developed network paradoxes, i.e., friendship paradox and activity paradox, we find that user sentiments are positively correlated to their number of friends but rarely to their social activity. Finally, we demonstrate how sentiment paradoxes can be used to predict user sentiments.

Social and Information Networks

FairTest: Discovering Unwarranted Associations in Data-Driven Applications

100 - Florian Tram`er , Vaggelis Atlidakis , Roxana Geambasu 2015

In a world where traditional notions of privacy are increasingly challenged by the myriad companies that collect and analyze our data, it is important that decision-making entities are held accountable for unfair treatments arising from irresponsible data usage. Unfortunately, a lack of appropriate methodologies and tools means that even identifying unfair or discriminatory effects can be a challenge in practice. We introduce the unwarranted associations (UA) framework, a principled methodology for the discovery of unfair, discriminatory, or offensive user treatment in data-driven applications. The UA framework unifies and rationalizes a number of prior attempts at formalizing algorithmic fairness. It uniquely combines multiple investigative primitives and fairness metrics with broad applicability, granular exploration of unfair treatment in user subgroups, and incorporation of natural notions of utility that may account for observed disparities. We instantiate the UA framework in FairTest, the first comprehensive tool that helps developers check data-driven applications for unfair user treatment. It enables scalable and statistically rigorous investigation of associations between application outcomes (such as prices or premiums) and sensitive user attributes (such as race or gender). Furthermore, FairTest provides debugging capabilities that let programmers rule out potential confounders for observed unfair effects. We report on use of FairTest to investigate and in some cases address disparate impact, offensive labeling, and uneven rates of algorithmic error in four data-driven applications. As examples, our results reveal subtle biases against older populations in the distribution of error in a predictive health application and offensive racial labeling in an image tagger.

Computers and Society

Can You Trust Your Models Uncertainty? Evaluating Predictive Uncertainty Under Dataset Shift

184 - Yaniv Ovadia , Emily Fertig , Jie Ren 2019

Modern machine learning methods including deep learning have achieved great success in predictive accuracy for supervised learning tasks, but may still fall short in giving useful estimates of their predictive {em uncertainty}. Quantifying uncertainty is especially critical in real-world settings, which often involve input distributions that are shifted from the training distribution due to a variety of factors including sample bias and non-stationarity. In such settings, well calibrated uncertainty estimates convey information about when a models output should (or should not) be trusted. Many probabilistic deep learning methods, including Bayesian-and non-Bayesian methods, have been proposed in the literature for quantifying predictive uncertainty, but to our knowledge there has not previously been a rigorous large-scale empirical comparison of these methods under dataset shift. We present a large-scale benchmark of existing state-of-the-art methods on classification problems and investigate the effect of dataset shift on accuracy and calibration. We find that traditional post-hoc calibration does indeed fall short, as do several other previous methods. However, some methods that marginalize over models give surprisingly strong results across a broad spectrum of tasks.

Machine Learning Machine Learning

Paradoxes in Social Networks with Multiple Products

194 - Krzysztof R. Apt , Evangelos Markakis , Sunil Simon 2013

Recently, we introduced in arXiv:1105.2434 a model for product adoption in social networks with multiple products, where the agents, influenced by their neighbours, can adopt one out of several alternatives. We identify and analyze here four types of paradoxes that can arise in these networks. To this end, we use social network games that we recently introduced in arxiv:1202.2209. These paradoxes shed light on possible inefficiencies arising when one modifies the sets of products available to the agents forming a social network. One of the paradoxes corresponds to the well-known Braess paradox in congestion games and shows that by adding more choices to a node, the network may end up in a situation that is worse for everybody. We exhibit a dual version of this, where removing available choices from someone can eventually make everybody better off. The other paradoxes that we identify show that by adding or removing a product from the choice set of some node may lead to permanent instability. Finally, we also identify conditions under which some of these paradoxes cannot arise.

Computer Science and Game Theory Social and Information Networks