Image Counterfactual Sensitivity Analysis for Detecting Unintended Bias

161 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Emily Denton

تاريخ النشر 2019

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Emily Denton - Ben Hutchinson - Margaret Mitchell

قم بزيارة صفحتنا على فيسبوك

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Facial analysis models are increasingly used in applications that have serious impacts on peoples lives, ranging from authentication to surveillance tracking. It is therefore critical to develop techniques that can reveal unintended biases in facial classifiers to help guide the ethical use of facial analysis technology. This work proposes a framework called textit{image counterfactual sensitivity analysis}, which we explore as a proof-of-concept in analyzing a smiling attribute classifier trained on faces of celebrities. The framework utilizes counterfactuals to examine how a classifiers prediction changes if a face characteristic slightly changes. We leverage recent advances in generative adversarial networks to build a realistic generative model of face images that affords controlled manipulation of specific image characteristics. We then introduce a set of metrics that measure the effect of manipulating a specific property on the output of the trained classifier. Empirically, we find several different factors of variation that affect the predictions of the smiling classifier. This proof-of-concept demonstrates potential ways generative models can be leveraged for fine-grained analysis of bias and fairness.

قيم البحث

110 - Daniel Borkan , Lucas Dixon , John Li 2019

This report examines the Pinned AUC metric introduced and highlights some of its limitations. Pinned AUC provides a threshold-agnostic measure of unintended bias in a classification model, inspired by the ROC-AUC metric. However, as we highlight in t his report, there are ways that the metric can obscure different kinds of unintended biases when the underlying class distributions on which bias is being measured are not carefully controlled.

التعلم الالي التعلم الآلي

Addressing Training Bias via Automated Image Annotation

141 - Zhujun Xiao , Yanzi Zhu , Yuxin Chen 2018

Build accurate DNN models requires training on large labeled, context specific datasets, especially those matching the target scenario. We believe advances in wireless localization, working in unison with cameras, can produce automated annotation of targets on images and videos captured in the wild. Using pedestrian and vehicle detection as examples, we demonstrate the feasibility, benefits, and challenges of an automatic image annotation system. Our work calls for new technical development on passive localization, mobile data analytics, and error-resilient ML models, as well as design issues in user privacy policies.

الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي التعلم الالي

Nuanced Metrics for Measuring Unintended Bias with Real Data for Text Classification

189 - Daniel Borkan , Lucas Dixon , Jeffrey Sorensen 2019

Unintended bias in Machine Learning can manifest as systemic differences in performance for different demographic groups, potentially compounding existing challenges to fairness in society at large. In this paper, we introduce a suite of threshold-ag nostic metrics that provide a nuanced view of this unintended bias, by considering the various ways that a classifiers score distribution can vary across designated groups. We also introduce a large new test set of online comments with crowd-sourced annotations for identity references. We use this to show how our metrics can be used to find new and potentially subtle unintended bias in existing public models.

التعلم الآلي الحساب واللغة التعلم الالي

Question-Conditioned Counterfactual Image Generation for VQA

83 - Jingjing Pan , Yash Goyal , Stefan Lee 2019

While Visual Question Answering (VQA) models continue to push the state-of-the-art forward, they largely remain black-boxes - failing to provide insight into how or why an answer is generated. In this ongoing work, we propose addressing this shortcom ing by learning to generate counterfactual images for a VQA model - i.e. given a question-image pair, we wish to generate a new image such that i) the VQA model outputs a different answer, ii) the new image is minimally different from the original, and iii) the new image is realistic. Our hope is that providing such counterfactual examples allows users to investigate and understand the VQA models internal mechanisms.

الرؤية الحاسوبية وتمييز الأنماط الحساب واللغة

Neutralizing Gender Bias in Word Embedding with Latent Disentanglement and Counterfactual Generation

85 - Seungjae Shin , Kyungwoo Song , JoonHo Jang 2020

Recent research demonstrates that word embeddings, trained on the human-generated corpus, have strong gender biases in embedding spaces, and these biases can result in the discriminative results from the various downstream tasks. Whereas the previous methods project word embeddings into a linear subspace for debiasing, we introduce a textit{Latent Disentanglement} method with a siamese auto-encoder structure with an adapted gradient reversal layer. Our structure enables the separation of the semantic latent information and gender latent information of given word into the disjoint latent dimensions. Afterwards, we introduce a textit{Counterfactual Generation} to convert the gender information of words, so the original and the modified embeddings can produce a gender-neutralized word embedding after geometric alignment regularization, without loss of semantic information. From the various quantitative and qualitative debiasing experiments, our method shows to be better than existing debiasing methods in debiasing word embeddings. In addition, Our method shows the ability to preserve semantic information during debiasing by minimizing the semantic information losses for extrinsic NLP downstream tasks.

الحساب واللغة التعلم الآلي التعلم الالي