New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Adversarial Scrubbing of Demographic Information for Text Classification

فرك الخصم من المعلومات الديموغرافية لتصنيف النص

716 0 0 0.0 ( 0 )

Download Cite

Added by Association for Computation Linguistics مقالة

Publication date 2021

fields Artificial Intelligence

and research's language is English

Created by Shamra Editor

تخفيف العرقية التي تعتمد على اللغة adversarial scrubbing target task التنظيف الخصم المهمة المستهدفة صناعة حمض الفوسفور

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Contextual representations learned by language models can often encode undesirable attributes, like demographic associations of the users, while being trained for an unrelated target task. We aim to scrub such undesirable attributes and learn fair representations while maintaining performance on the target task. In this paper, we present an adversarial learning framework Adversarial Scrubber'' (AdS), to debias contextual representations. We perform theoretical analysis to show that our framework converges without leaking demographic information under certain conditions. We extend previous evaluation techniques by evaluating debiasing performance using Minimum Description Length (MDL) probing. Experimental evaluations on 8 datasets show that AdS generates representations with minimal information about demographic attributes while being maximally informative about the target task.

References used

https://aclanthology.org/

rate research

Continual Learning for Text Classification with Information Disentanglement Based Regularization

340 - Association for Computation Linguistics 2021 مقالة

Continual learning has become increasingly important as it enables NLP models to constantly learn and gain knowledge over time. Previous continual learning methods are mainly designed to preserve knowledge from previous tasks, without much emphasis o n how to well generalize models to new tasks. In this work, we propose an information disentanglement based regularization method for continual learning on text classification. Our proposed method first disentangles text hidden spaces into representations that are generic to all tasks and representations specific to each individual task, and further regularizes these representations differently to better constrain the knowledge required to generalize. We also introduce two simple auxiliary tasks: next sentence prediction and task-id prediction, for learning better generic and specific representation spaces. Experiments conducted on large-scale benchmarks demonstrate the effectiveness of our method in continual text classification tasks with various sequences and lengths over state-of-the-art baselines. We have publicly released our code at https://github.com/GT-SALT/IDBR.

information disentanglement based disentanglement based regularization disentanglement based محطات المعلومات القائمة التنظيم القائم على DEFENTANGLE DEVENTANGLEMENE صناعة حمض الفوسفور المزيد..

HTCInfoMax: A Global Model for Hierarchical Text Classification via Information Maximization

331 - Association for Computation Linguistics 2021 مقالة

The current state-of-the-art model HiAGM for hierarchical text classification has two limitations. First, it correlates each text sample with all labels in the dataset which contains irrelevant information. Second, it does not consider any statistica l constraint on the label representations learned by the structure encoder, while constraints for representation learning are proved to be helpful in previous work. In this paper, we propose HTCInfoMax to address these issues by introducing information maximization which includes two modules: text-label mutual information maximization and label prior matching. The first module can model the interaction between each text sample and its ground truth labels explicitly which filters out irrelevant information. The second one encourages the structure encoder to learn better representations with desired characteristics for all labels which can better handle label imbalance in hierarchical text classification. Experimental results on two benchmark datasets demonstrate the effectiveness of the proposed HTCInfoMax.

hierarchical text classification hierarchical text تصنيف النص الهرمي النص الهرمي صناعة حمض الفوسفور

Universal Adversarial Attacks with Natural Triggers for Text Classification

399 - Association for Computation Linguistics 2021 مقالة

Recent work has demonstrated the vulnerability of modern text classifiers to universal adversarial attacks, which are input-agnostic sequences of words added to text processed by classifiers. Despite being successful, the word sequences produced in s uch attacks are often ungrammatical and can be easily distinguished from natural text. We develop adversarial attacks that appear closer to natural English phrases and yet confuse classification systems when added to benign inputs. We leverage an adversarially regularized autoencoder (ARAE) to generate triggers and propose a gradient-based search that aims to maximize the downstream classifier's prediction loss. Our attacks effectively reduce model accuracy on classification tasks while being less identifiable than prior models as per automatic detection metrics and human-subject studies. Our aim is to demonstrate that adversarial attacks can be made harder to detect than previously thought and to enable the development of appropriate defenses.

universal adversarial attacks adversarial attacks universal adversarial الهجمات الخصومة العالمية هجمات الخصومة الخصم العالمي صناعة حمض الفوسفور المزيد..

Continual Few-Shot Learning for Text Classification

557 - Association for Computation Linguistics 2021 مقالة

Natural Language Processing (NLP) is increasingly relying on general end-to-end systems that need to handle many different linguistic phenomena and nuances. For example, a Natural Language Inference (NLI) system has to recognize sentiment, handle num bers, perform coreference, etc. Our solutions to complex problems are still far from perfect, so it is important to create systems that can learn to correct mistakes quickly, incrementally, and with little training data. In this work, we propose a continual few-shot learning (CFL) task, in which a system is challenged with a difficult phenomenon and asked to learn to correct mistakes with only a few (10 to 15) training examples. To this end, we first create benchmarks based on previously annotated data: two NLI (ANLI and SNLI) and one sentiment analysis (IMDB) datasets. Next, we present various baselines from diverse paradigms (e.g., memory-aware synapses and Prototypical networks) and compare them on few-shot learning and continual few-shot learning setups. Our contributions are in creating a benchmark suite and evaluation protocol for continual few-shot learning on the text classification tasks, and making several interesting observations on the behavior of similarity-based methods. We hope that our work serves as a useful starting point for future work on this important topic.

نقل متبرع عبر اللغات continual few-shot learning عدد قليل من التعلم قليلا صناعة حمض الفوسفور

AEDA: An Easier Data Augmentation Technique for Text Classification

384 - Association for Computation Linguistics 2021 مقالة

This paper proposes AEDA (An Easier Data Augmentation) technique to help improve the performance on text classification tasks. AEDA includes only random insertion of punctuation marks into the original text. This is an easier technique to implement f or data augmentation than EDA method (Wei and Zou, 2019) with which we compare our results. In addition, it keeps the order of the words while changing their positions in the sentence leading to a better generalized performance. Furthermore, the deletion operation in EDA can cause loss of information which, in turn, misleads the network, whereas AEDA preserves all the input information. Following the baseline, we perform experiments on five different datasets for text classification. We show that using the AEDA-augmented data for training, the models show superior performance compared to using the EDA-augmented data in all five datasets. The source code will be made available for further study and reproduction of the results.

easier data augmentation data augmentation technique أسهل تكبير البيانات تقنية تكبير البيانات صناعة حمض الفوسفور

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Adversarial Scrubbing of Demographic Information for Text Classification

فرك الخصم من المعلومات الديموغرافية لتصنيف النص

Ask ChatGPT about the research

Read More

suggested questions