FairPrep: Promoting Data to a First-Class Citizen in Studies on Fairness-Enhancing Interventions

79 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Julia Stoyanovich

تاريخ النشر 2019

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Sebastian Schelter - Yuxuan He - Jatin Khilnani

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

The importance of incorporating ethics and legal compliance into machine-assisted decision-making is broadly recognized. Further, several lines of recent work have argued that critical opportunities for improving data quality and representativeness, controlling for bias, and allowing humans to oversee and impact computational processes are missed if we do not consider the lifecycle stages upstream from model training and deployment. Yet, very little has been done to date to provide system-level support to data scientists who wish to develop and deploy responsible machine learning methods. We aim to fill this gap and present FairPrep, a design and evaluation framework for fairness-enhancing interventions. FairPrep is based on a developer-centered design, and helps data scientists follow best practices in software engineering and machine learning. As part of our contribution, we identify shortcomings in existing empirical studies for analyzing fairness-enhancing interventions. We then show how FairPrep can be used to measure the impact of sound best practices, such as hyperparameter tuning and feature scaling. In particular, our results suggest that the high variability of the outcomes of fairness-enhancing interventions observed in previous studies is often an artifact of a lack of hyperparameter tuning. Further, we show that the choice of a data cleaning method can impact the effectiveness of fairness-enhancing interventions.

قيم البحث

335 - Chloe Bakalar , Renata Barreto , Stevie Bergman 2021

Many technical approaches have been proposed for ensuring that decisions made by machine learning systems are fair, but few of these proposals have been stress-tested in real-world systems. This paper presents an example of one teams approach to the challenge of applying algorithmic fairness approaches to complex production systems within the context of a large technology company. We discuss how we disentangle normative questions of product and policy design (like, how should the system trade off between different stakeholders interests and needs?) from empirical questions of system implementation (like, is the system achieving the desired tradeoff in practice?). We also present an approach for answering questions of the latter sort, which allows us to measure how machine learning systems and human labelers are making these tradeoffs across different relevant groups. We hope our experience integrating fairness tools and approaches into large-scale and complex production systems will be useful to other practitioners facing similar challenges, and illuminating to academics and researchers looking to better address the needs of practitioners.

التعلم الآلي أجهزة الكمبيوتر والمجتمع

Fair Data Integration

105 - Sainyam Galhotra , Karthikeyan Shanmugam , Prasanna Sattigeri 2020

The use of machine learning (ML) in high-stakes societal decisions has encouraged the consideration of fairness throughout the ML lifecycle. Although data integration is one of the primary steps to generate high quality training data, most of the fai rness literature ignores this stage. In this work, we consider fairness in the integration component of data management, aiming to identify features that improve prediction without adding any bias to the dataset. We work under the causal interventional fairness paradigm. Without requiring the underlying structural causal model a priori, we propose an approach to identify a sub-collection of features that ensure the fairness of the dataset by performing conditional independence tests between different subsets of features. We use group testing to improve the complexity of the approach. We theoretically prove the correctness of the proposed algorithm to identify features that ensure interventional fairness and show that sub-linear conditional independence tests are sufficient to identify these variables. A detailed empirical evaluation is performed on real-world datasets to demonstrate the efficacy and efficiency of our technique.

التعلم الآلي أجهزة الكمبيوتر والمجتمع قواعد البيانات

Feature-based Individual Fairness in k-Clustering

150 - Debajyoti Kar , Sourav Medya , Debmalya Mandal 2021

Ensuring fairness in machine learning algorithms is a challenging and important task. We consider the problem of clustering a set of points while ensuring fairness constraints. While there have been several attempts to capture group fairness in the k -clustering problem, fairness at an individual level is not well-studied. We introduce a new notion of individual fairness in k-clustering based on features that are not necessarily used for clustering. We show that this problem is NP-hard and does not admit a constant factor approximation. We then design a randomized algorithm that guarantees approximation both in terms of minimizing the clustering distance objective as well as individual fairness under natural restrictions on the distance metric and fairness constraints. Finally, our experimental results validate that our algorithm produces lower clustering costs compared to existing algorithms while being competitive in individual fairness.

التعلم الآلي أجهزة الكمبيوتر والمجتمع بنى وهياكل البيانات والخوارزميات

Transparency Tools for Fairness in AI (Luskin)

84 - Mingliang Chen , Aria Shahverdi , Sarah Anderson 2020

We propose new tools for policy-makers to use when assessing and correcting fairness and bias in AI algorithms. The three tools are: - A new definition of fairness called controlled fairness with respect to choices of protected features and filters . The definition provides a simple test of fairness of an algorithm with respect to a dataset. This notion of fairness is suitable in cases where fairness is prioritized over accuracy, such as in cases where there is no ground truth data, only data labeled with past decisions (which may have been biased). - Algorithms for retraining a given classifier to achieve controlled fairness with respect to a choice of features and filters. Two algorithms are presented, implemented and tested. These algorithms require training two different models in two stages. We experiment with combinations of various types of models for the first and second stage and report on which combinations perform best in terms of fairness and accuracy. - Algorithms for adjusting model parameters to achieve a notion of fairness called classification parity. This notion of fairness is suitable in cases where accuracy is prioritized. Two algorithms are presented, one which assumes that protected features are accessible to the model during testing, and one which assumes protected features are not accessible during testing. We evaluate our tools on three different publicly available datasets. We find that the tools are useful for understanding various dimensions of bias, and that in practice the algorithms are effective in starkly reducing a given observed bias when tested on new data.

التعلم الآلي أجهزة الكمبيوتر والمجتمع التعلم الالي

FairGAN: Fairness-aware Generative Adversarial Networks

174 - Depeng Xu , Shuhan Yuan , Lu Zhang 2018

Fairness-aware learning is increasingly important in data mining. Discrimination prevention aims to prevent discrimination in the training data before it is used to conduct predictive analysis. In this paper, we focus on fair data generation that ens ures the generated data is discrimination free. Inspired by generative adversarial networks (GAN), we present fairness-aware generative adversarial networks, called FairGAN, which are able to learn a generator producing fair data and also preserving good data utility. Compared with the naive fair data generation models, FairGAN further ensures the classifiers which are trained on generated data can achieve fair classification on real data. Experiments on a real dataset show the effectiveness of FairGAN.

التعلم الآلي أجهزة الكمبيوتر والمجتمع التعلم الالي