تقدم هذه الورقة تقديم نظامنا إلى المهمة 5: تمثل المسابقة السامة من مسابقة Semeval-2021.تهدف المنافسة إلى اكتشاف الجرف الذي يصنع سامة سامة.في هذه الورقة، نوضح نظامنا للكشف عن المواقف السامة، والتي تشمل توسيع نطاق التدريب السام الذي تم تعيينه مع تفسيرات نموذجية غير مرغوية للطراز المحلي (الجير)، وطيب الروبيرتا الناعم للكشف، وتحليل الأخطاء.وجدنا أن إطعام النموذج مع مجموعة تدريبية موسعة باستخدام تعليقات Reddit من السماد المستقطب والسمية مع الجير على رأس تصنيف الانحدار اللوجستي يمكن أن يساعد روبرتا على تعلم أكثر دقة التعرف على الأمور السامة.حققنا درجة F1 المستفادة من 0.6715 على مرحلة الاختبار.تظهر نتائجنا الكمية والنوعية أن التنبؤات من نظامنا يمكن أن تكون ملحقا جيدا لشروح مجموعة تدريب الذهب.
This paper presents our system submission to task 5: Toxic Spans Detection of the SemEval-2021 competition. The competition aims at detecting the spans that make a toxic span toxic. In this paper, we demonstrate our system for detecting toxic spans, which includes expanding the toxic training set with Local Interpretable Model-Agnostic Explanations (LIME), fine-tuning RoBERTa model for detection, and error analysis. We found that feeding the model with an expanded training set using Reddit comments of polarized-toxicity and labeling with LIME on top of logistic regression classification could help RoBERTa more accurately learn to recognize toxic spans. We achieved a span-level F1 score of 0.6715 on the testing phase. Our quantitative and qualitative results show that the predictions from our system could be a good supplement to the gold training set's annotations.
References used
https://aclanthology.org/
This paper describes the system developed by the Antwerp Centre for Digital humanities and literary Criticism [UAntwerp] for toxic span detection. We used a stacked generalisation ensemble of five component models, with two distinct interpretations o
Toxicity is pervasive in social media and poses a major threat to the health of online communities. The recent introduction of pre-trained language models, which have achieved state-of-the-art results in many NLP tasks, has transformed the way in whi
This article introduces the system description of the hub team, which explains the related work and experimental results of our team's participation in SemEval 2021 Task 5: Toxic Spans Detection. The data for this shared task comes from some posts on
The Toxic Spans Detection task of SemEval-2021 required participants to predict the spans of toxic posts that were responsible for the toxic label of the posts. The task could be addressed as supervised sequence labeling, using training data with gol
Recurrent Neural Networks (RNN) have been widely used in various Natural Language Processing (NLP) tasks such as text classification, sequence tagging, and machine translation. Long Short Term Memory (LSTM), a special unit of RNN, has the benefit of