Mitigating Biases in Toxic Language Detection through Invariant Rationalization

88 0 0.0 ( 0 )

Download Cite

Added by Yung-Sung Chuang

Publication date 2021

fields Informatics Engineering

and research's language is English

Authors Yung-Sung Chuang - Mingye Gao - Hongyin Luo

Computation and Language

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Automatic detection of toxic language plays an essential role in protecting social media users, especially minority groups, from verbal abuse. However, biases toward some attributes, including gender, race, and dialect, exist in most training datasets for toxicity detection. The biases make the learned models unfair and can even exacerbate the marginalization of people. Considering that current debiasing methods for general natural language understanding tasks cannot effectively mitigate the biases in the toxicity detectors, we propose to use invariant rationalization (InvRat), a game-theoretic framework consisting of a rationale generator and a predictor, to rule out the spurious correlation of certain syntactic patterns (e.g., identity mentions, dialect) to toxicity labels. We empirically show that our method yields lower false positive rate in both lexical and dialectal attributes than previous debiasing methods.

rate research

Towards Understanding and Mitigating Social Biases in Language Models

375 - Paul Pu Liang , Chiyu Wu , Louis-Philippe Morency 2021

As machine learning methods are deployed in real-world settings such as healthcare, legal systems, and social science, it is crucial to recognize how they shape social biases and stereotypes in these sensitive decision-making processes. Among such real-world deployments are large-scale pretrained language models (LMs) that can be potentially dangerous in manifesting undesirable representational biases - harmful biases resulting from stereotyping that propagate negative generalizations involving gender, race, religion, and other social constructs. As a step towards improving the fairness of LMs, we carefully define several sources of representational biases before proposing new benchmarks and metrics to measure them. With these tools, we propose steps towards mitigating social biases during text generation. Our empirical results and human evaluation demonstrate effectiveness in mitigating bias while retaining crucial contextual information for high-fidelity text generation, thereby pushing forward the performance-fairness Pareto frontier.

Computation and Language Artificial Intelligence Computers and Society

Rationalization through Concepts

101 - Diego Antognini , Boi Faltings 2021

Automated predictions require explanations to be interpretable by humans. One type of explanation is a rationale, i.e., a selection of input features such as relevant text snippets from which the model computes the outcome. However, a single overall selection does not provide a complete explanation, e.g., weighing several aspects for decisions. To this end, we present a novel self-interpretable model called ConRAT. Inspired by how human explanations for high-level decisions are often based on key concepts, ConRAT extracts a set of text snippets as concepts and infers which ones are described in the document. Then, it explains the outcome with a linear aggregation of concepts. Two regularizers drive ConRAT to build interpretable concepts. In addition, we propose two techniques to boost the rationale and predictive performance further. Experiments on both single- and multi-aspect sentiment classification tasks show that ConRAT is the first to generate concepts that align with human rationalization while using only the overall label. Further, it outperforms state-of-the-art methods trained on each aspect label independently.

Computation and Language Machine Learning

Mitigating Political Bias in Language Models Through Reinforced Calibration

131 - Ruibo Liu , Chenyan Jia , Jason Wei 2021

Current large-scale language models can be politically biased as a result of the data they are trained on, potentially causing serious problems when they are deployed in real-world settings. In this paper, we describe metrics for measuring political bias in GPT-2 generation and propose a reinforcement learning (RL) framework for mitigating political biases in generated text. By using rewards from word embeddings or a classifier, our RL framework guides debiased generation without having access to the training data or requiring the model to be retrained. In empirical experiments on three attributes sensitive to political bias (gender, location, and topic), our methods reduced bias according to both our metrics and human evaluation, while maintaining readability and semantic coherence.

Computation and Language Artificial Intelligence

Toxic Language Detection in Social Media for Brazilian Portuguese: New Dataset and Multilingual Analysis

110 - Jo~ao A. Leite , Diego F. Silva , Kalina Bontcheva 2020

Hate speech and toxic comments are a common concern of social media platform users. Although these comments are, fortunately, the minority in these platforms, they are still capable of causing harm. Therefore, identifying these comments is an important task for studying and preventing the proliferation of toxicity in social media. Previous work in automatically detecting toxic comments focus mainly in English, with very few work in languages like Brazilian Portuguese. In this paper, we propose a new large-scale dataset for Brazilian Portuguese with tweets annotated as either toxic or non-toxic or in different types of toxicity. We present our dataset collection and annotation process, where we aimed to select candidates covering multiple demographic groups. State-of-the-art BERT models were able to achieve 76% macro-F1 score using monolingual data in the binary case. We also show that large-scale monolingual data is still needed to create more accurate models, despite recent advances in multilingual approaches. An error analysis and experiments with multi-label classification show the difficulty of classifying certain types of toxic comments that appear less frequently in our data and highlights the need to develop models that are aware of different categories of toxicity.

Computation and Language Machine Learning Social and Information Networks

Mitigating Shear-dependent Object Detection Biases with Metacalibration

79 - Erin S. Sheldon , Matthew R. Becker , Niall MacCrann 2019

Metacalibration is a new technique for measuring weak gravitational lensing shear that is unbiased for isolated galaxy images. In this work we test metacalibration with overlapping, or ``blended galaxy images. Using standard metacalibration, we find a few percent shear measurement bias for galaxy densities relevant for current surveys, and that this bias increases with increasing galaxy number density. We show that this bias is not due to blending itself, but rather to shear-dependent object detection. If object detection is shear independent, no deblending of images is needed, in principle. We demonstrate that detection biases are accurately removed when including object detection in the metacalibration process, a technique we call metadetection. This process involves applying an artificial shear to images of small regions of sky and performing detection on the sheared images, as well as measurements that are used to calculate a shear response. We demonstrate that the method can accurately recover weak shear signals even in highly blended scenes. In the metacalibration process, the space between objects is sheared coherently, which does not perfectly match the real universe in which some, but not all, galaxy images are sheared coherently. We find that even for the worst case scenario, in which the space between objects is completely unsheared, the resulting shear bias is at most a few tenths of a percent for future surveys. We discuss additional technical challenges that must be met in order to implement metadetection for real surveys.

Cosmology and Nongalactic Astrophysics