يتم تعريف مهمة الكشف عن الفقاعات السامة (TSD) على أنها تسليط الضوء على يمتد يمتد النص السام.تم إجراء العديد من الأعمال لتصنيف تعليق أو وثيقة معينة على أنها سامة أو غير سامة.ومع ذلك، لا تعمل أي من هذه النماذج المقترحة على مستوى الرمز المميز.في هذه الورقة، نقترح وحدة متكررة ثنائية الاهتمام بالانتباه (BIGRU) مع تمثيل متعدد التضمين للرموز.يثري نموذجنا المقترح التمثيل بمزيج من GPT-2، قفاز، و Aroperta Ageddings، مما أدى إلى نتائج واعدة.تظهر النتائج التجريبية أن نهجنا المقترح فعال للغاية في الكشف عن الرموز المميزة.
Toxic Spans Detection(TSD) task is defined as highlighting spans that make a text toxic. Many works have been done to classify a given comment or document as toxic or non-toxic. However, none of those proposed models work at the token level. In this paper, we propose a self-attention-based bidirectional gated recurrent unit(BiGRU) with a multi-embedding representation of the tokens. Our proposed model enriches the representation by a combination of GPT-2, GloVe, and RoBERTa embeddings, which led to promising results. Experimental results show that our proposed approach is very effective in detecting span tokens.
References used
https://aclanthology.org/
Recurrent Neural Networks (RNN) have been widely used in various Natural Language Processing (NLP) tasks such as text classification, sequence tagging, and machine translation. Long Short Term Memory (LSTM), a special unit of RNN, has the benefit of
This paper presents our submission to SemEval-2021 Task 5: Toxic Spans Detection. The purpose of this task is to detect the spans that make a text toxic, which is a complex labour for several reasons. Firstly, because of the intrinsic subjectivity of
This paper describes our system participated in Task 7 of SemEval-2021: Detecting and Rating Humor and Offense. The task is designed to detect and score humor and offense which are influenced by subjective factors. In order to obtain semantic informa
With the rapid growth in technology, social media activity has seen a boom across all age groups. It is humanly impossible to check all the tweets, comments and status manually whether they follow proper community guidelines. A lot of toxicity is reg
In this paper, we present our contribution in SemEval-2021 Task 1: Lexical Complexity Prediction, where we integrate linguistic, statistical, and semantic properties of the target word and its context as features within a Machine Learning (ML) framew