نستفيد من BLSTM مع الاهتمام لتحديد المواقف السامة في النصوص.نستكشف أبعاد مختلفة تؤثر على أداء النموذج.البعد الأول الذي تم استكشافه هو المجموعة السامة يتم تدريب النموذج.إلى جانب مجموعة البيانات المقدمة، نستكشف قدرة تحويل 5 مجموعات ذات صلة سامة مختلفة، بما في ذلك مجموعات الهجومية والسامة والمسيئة والكراهية.نجد أن المجموعة المسيئة فقط تظهر أعلى وعد القدرة على التحويل.البعد الثاني الذي نستكشفه هو المنهجية، بما في ذلك الاستفادة من الاهتمام، وتوظيف طريقة إزالة الجشع، باستخدام نسبة التردد، وفحص المجموعات الهجينة من طرق متعددة.نقوم بإجراء تحليل خطأ لفحص أنواع الأيوب السامة التي تم تفويتها والتي تم استنتاجها بشكل خاطئ على أنها سامة مع الأسباب الرئيسية وراء حدوثها.أخيرا، نقوم بتوسيع نطاق أسلوبنا عبر الفرع، والذي يحقق أعلى درجة F1 لدينا من 55.1.
We leverage a BLSTM with attention to identify toxic spans in texts. We explore different dimensions which affect the model's performance. The first dimension explored is the toxic set the model is trained on. Besides the provided dataset, we explore the transferability of 5 different toxic related sets, including offensive, toxic, abusive, and hate sets. We find that the solely offensive set shows the highest promise of transferability. The second dimension we explore is methodology, including leveraging attention, employing a greedy remove method, using a frequency ratio, and examining hybrid combinations of multiple methods. We conduct an error analysis to examine which types of toxic spans were missed and which were wrongly inferred as toxic along with the main reasons why they occurred. Finally, we extend our method via ensembles, which achieves our highest F1 score of 55.1.
References used
https://aclanthology.org/
The Toxic Spans Detection task of SemEval-2021 required participants to predict the spans of toxic posts that were responsible for the toxic label of the posts. The task could be addressed as supervised sequence labeling, using training data with gol
This paper presents our submission to SemEval-2021 Task 5: Toxic Spans Detection. The purpose of this task is to detect the spans that make a text toxic, which is a complex labour for several reasons. Firstly, because of the intrinsic subjectivity of
Toxic language is often present in online forums, especially when politics and other polarizing topics arise, and can lead to people becoming discouraged from joining or continuing conversations. In this paper, we use data consisting of comments with
With the rapid growth in technology, social media activity has seen a boom across all age groups. It is humanly impossible to check all the tweets, comments and status manually whether they follow proper community guidelines. A lot of toxicity is reg
Social network platforms are generally used to share positive, constructive, and insightful content. However, in recent times, people often get exposed to objectionable content like threat, identity attacks, hate speech, insults, obscene texts, offen