Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Exploiting Auxiliary Data for Offensive Language Detection with Bidirectional Transformers

استغلال البيانات المساعدة للكشف عن اللغة الهجومية مع محولات ثنائية الاتجاه

987 0 0 0.0 ( 0 )

Download Cite

Added by Association for Computation Linguistics مقالة

Publication date 2021

fields Artificial Intelligence

and research's language is English

Created by Shamra Editor

offensive language detection offensive language language detection الكشف عن اللغة الهجومية لغة هجومية كشف اللغة صناعة حمض الفوسفور

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Offensive language detection (OLD) has received increasing attention due to its societal impact. Recent work shows that bidirectional transformer based methods obtain impressive performance on OLD. However, such methods usually rely on large-scale well-labeled OLD datasets for model training. To address the issue of data/label scarcity in OLD, in this paper, we propose a simple yet effective domain adaptation approach to train bidirectional transformers. Our approach introduces domain adaptation (DA) training procedures to ALBERT, such that it can effectively exploit auxiliary data from source domains to improve the OLD performance in a target domain. Experimental results on benchmark datasets show that our approach, ALBERT (DA), obtains the state-of-the-art performance in most cases. Particularly, our approach significantly benefits underrepresented and under-performing classes, with a significant improvement over ALBERT.

References used

https://aclanthology.org/

rate research

Leveraging Offensive Language for Sarcasm and Sentiment Detection in Arabic

854 - Association for Computation Linguistics 2021 مقالة

Sarcasm detection is one of the top challenging tasks in text classification, particularly for informal Arabic with high syntactic and semantic ambiguity. We propose two systems that harness knowledge from multiple tasks to improve the performance of the classifier. This paper presents the systems used in our participation to the two sub-tasks of the Sixth Arabic Natural Language Processing Workshop (WANLP); Sarcasm Detection and Sentiment Analysis. Our methodology is driven by the hypothesis that tweets with negative sentiment and tweets with sarcasm content are more likely to have offensive content, thus, fine-tuning the classification model using large corpus of offensive language, supports the learning process of the model to effectively detect sentiment and sarcasm contents. Results demonstrate the effectiveness of our approach for sarcasm detection task over sentiment analysis task.

leveraging offensive language sixth arabic natural الاستفادة من اللغة الهجومية السادسة العربية الطبيعية صناعة حمض الفوسفور

AuGPT: Auxiliary Tasks and Data Augmentation for End-To-End Dialogue with Pre-Trained Language Models

618 - Association for Computation Linguistics 2021 مقالة

Attention-based pre-trained language models such as GPT-2 brought considerable progress to end-to-end dialogue modelling. However, they also present considerable risks for task-oriented dialogue, such as lack of knowledge grounding or diversity. To a ddress these issues, we introduce modified training objectives for language model finetuning, and we employ massive data augmentation via back-translation to increase the diversity of the training data. We further examine the possibilities of combining data from multiples sources to improve performance on the target dataset. We carefully evaluate our contributions with both human and automatic methods. Our model substantially outperforms the baseline on the MultiWOZ data and shows competitive performance with state of the art in both automatic and human evaluation.

auxiliary tasks attention-based pre-trained language المهام المساعدة الانتباه القائمة المدربة مسبقا صناعة حمض الفوسفور

InFillmore: Frame-Guided Language Generation with Bidirectional Context

658 - Association for Computation Linguistics 2021 مقالة

We propose a structured extension to bidirectional-context conditional language generation, or infilling,'' inspired by Frame Semantic theory. Guidance is provided through one of two approaches: (1) model fine-tuning, conditioning directly on observe d symbolic frames, and (2) a novel extension to disjunctive lexically constrained decoding that leverages frame semantic lexical units. Automatic and human evaluations confirm that frame-guided generation allows for explicit manipulation of intended infill semantics, with minimal loss in distinguishability from human-generated text. Our methods flexibly apply to a variety of use scenarios, and we provide an interactive web demo.

bidirectional context language generation السياق ثنائي الاتجاه جيل اللغة سياق الكلام صناعة حمض الفوسفور

Target-Aware Data Augmentation for Stance Detection

1032 - Association for Computation Linguistics 2021 مقالة

The goal of stance detection is to identify whether the author of a text is in favor of, neutral or against a specific target. Despite substantial progress on this task, one of the remaining challenges is the scarcity of annotations. Data augmentatio n is commonly used to address annotation scarcity by generating more training samples. However, the augmented sentences that are generated by existing methods are either less diversified or inconsistent with the given target and stance label. In this paper, we formulate the data augmentation of stance detection as a conditional masked language modeling task and augment the dataset by predicting the masked word conditioned on both its context and the auxiliary sentence that contains target and label information. Moreover, we propose another simple yet effective method that generates target-aware sentence by replacing a target mention with the other. Experimental results show that our proposed methods significantly outperforms previous augmentation methods on 11 targets.

تشعب صناعة حمض الفوسفور

MUDES: Multilingual Detection of Offensive Spans

594 - Association for Computation Linguistics 2021 مقالة

The interest in offensive content identification in social media has grown substantially in recent years. Previous work has dealt mostly with post level annotations. However, identifying offensive spans is useful in many ways. To help coping with thi s important challenge, we present MUDES, a multilingual system to detect offensive spans in texts. MUDES features pre-trained models, a Python API for developers, and a user-friendly web-based interface. A detailed description of MUDES' components is presented in this paper.

multilingual detection offensive spans الكشف متعدد اللغات المدعم الهجومي صناعة حمض الفوسفور

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Exploiting Auxiliary Data for Offensive Language Detection with Bidirectional Transformers

استغلال البيانات المساعدة للكشف عن اللغة الهجومية مع محولات ثنائية الاتجاه

Ask ChatGPT about the research

Read More

suggested questions