No Arabic abstract
Facial analysis models are increasingly used in applications that have serious impacts on peoples lives, ranging from authentication to surveillance tracking. It is therefore critical to develop techniques that can reveal unintended biases in facial classifiers to help guide the ethical use of facial analysis technology. This work proposes a framework called textit{image counterfactual sensitivity analysis}, which we explore as a proof-of-concept in analyzing a smiling attribute classifier trained on faces of celebrities. The framework utilizes counterfactuals to examine how a classifiers prediction changes if a face characteristic slightly changes. We leverage recent advances in generative adversarial networks to build a realistic generative model of face images that affords controlled manipulation of specific image characteristics. We then introduce a set of metrics that measure the effect of manipulating a specific property on the output of the trained classifier. Empirically, we find several different factors of variation that affect the predictions of the smiling classifier. This proof-of-concept demonstrates potential ways generative models can be leveraged for fine-grained analysis of bias and fairness.
This report examines the Pinned AUC metric introduced and highlights some of its limitations. Pinned AUC provides a threshold-agnostic measure of unintended bias in a classification model, inspired by the ROC-AUC metric. However, as we highlight in this report, there are ways that the metric can obscure different kinds of unintended biases when the underlying class distributions on which bias is being measured are not carefully controlled.
Build accurate DNN models requires training on large labeled, context specific datasets, especially those matching the target scenario. We believe advances in wireless localization, working in unison with cameras, can produce automated annotation of targets on images and videos captured in the wild. Using pedestrian and vehicle detection as examples, we demonstrate the feasibility, benefits, and challenges of an automatic image annotation system. Our work calls for new technical development on passive localization, mobile data analytics, and error-resilient ML models, as well as design issues in user privacy policies.
Unintended bias in Machine Learning can manifest as systemic differences in performance for different demographic groups, potentially compounding existing challenges to fairness in society at large. In this paper, we introduce a suite of threshold-agnostic metrics that provide a nuanced view of this unintended bias, by considering the various ways that a classifiers score distribution can vary across designated groups. We also introduce a large new test set of online comments with crowd-sourced annotations for identity references. We use this to show how our metrics can be used to find new and potentially subtle unintended bias in existing public models.
While Visual Question Answering (VQA) models continue to push the state-of-the-art forward, they largely remain black-boxes - failing to provide insight into how or why an answer is generated. In this ongoing work, we propose addressing this shortcoming by learning to generate counterfactual images for a VQA model - i.e. given a question-image pair, we wish to generate a new image such that i) the VQA model outputs a different answer, ii) the new image is minimally different from the original, and iii) the new image is realistic. Our hope is that providing such counterfactual examples allows users to investigate and understand the VQA models internal mechanisms.
Recent research demonstrates that word embeddings, trained on the human-generated corpus, have strong gender biases in embedding spaces, and these biases can result in the discriminative results from the various downstream tasks. Whereas the previous methods project word embeddings into a linear subspace for debiasing, we introduce a textit{Latent Disentanglement} method with a siamese auto-encoder structure with an adapted gradient reversal layer. Our structure enables the separation of the semantic latent information and gender latent information of given word into the disjoint latent dimensions. Afterwards, we introduce a textit{Counterfactual Generation} to convert the gender information of words, so the original and the modified embeddings can produce a gender-neutralized word embedding after geometric alignment regularization, without loss of semantic information. From the various quantitative and qualitative debiasing experiments, our method shows to be better than existing debiasing methods in debiasing word embeddings. In addition, Our method shows the ability to preserve semantic information during debiasing by minimizing the semantic information losses for extrinsic NLP downstream tasks.