New community

Subscribe to the gold package and get unlimited access to Shamra Academy

UTNLP at SemEval-2021 Task 5: A Comparative Analysis of Toxic Span Detection using Attention-based, Named Entity Recognition, and Ensemble Models

UTNLP في مهمة Semeval-2021: تحليل مقارن للكشف عن المكشوف السامة باستخدام الاهتمام، والاعتراف الكياني المسمى، ونماذج الفرقة

308 0 0 0.0 ( 0 )

Download Cite

Added by Association for Computation Linguistics مقالة

Publication date 2021

fields Artificial Intelligence

and research's language is English

Created by Shamra Editor

حوار SRPOL. صناعة حمض الفوسفور

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Detecting which parts of a sentence contribute to that sentence's toxicity---rather than providing a sentence-level verdict of hatefulness--- would increase the interpretability of models and allow human moderators to better understand the outputs of the system. This paper presents our team's, UTNLP, methodology and results in the SemEval-2021 shared task 5 on toxic spans detection. We test multiple models and contextual embeddings and report the best setting out of all. The experiments start with keyword-based models and are followed by attention-based, named entity- based, transformers-based, and ensemble models. Our best approach, an ensemble model, achieves an F1 of 0.684 in the competition's evaluation phase.

References used

https://aclanthology.org/

rate research

SINAI at SemEval-2021 Task 5: Combining Embeddings in a BiLSTM-CRF model for Toxic Spans Detection

290 - Association for Computation Linguistics 2021 مقالة

This paper describes the participation of SINAI team at Task 5: Toxic Spans Detection which consists of identifying spans that make a text toxic. Although several resources and systems have been developed so far in the context of offensive language, both annotation and tasks have mainly focused on classifying whether a text is offensive or not. However, detecting toxic spans is crucial to identify why a text is toxic and can assist human moderators to locate this type of content on social media. In order to accomplish the task, we follow a deep learning-based approach using a Bidirectional variant of a Long Short Term Memory network along with a stacked Conditional Random Field decoding layer (BiLSTM-CRF). Specifically, we test the performance of the combination of different pre-trained word embeddings for recognizing toxic entities in text. The results show that the combination of word embeddings helps in detecting offensive content. Our team ranks 29th out of 91 participants.

حوار SRPOL. صناعة حمض الفوسفور

macech at SemEval-2021 Task 5: Toxic Spans Detection

277 - Association for Computation Linguistics 2021 مقالة

Toxic language is often present in online forums, especially when politics and other polarizing topics arise, and can lead to people becoming discouraged from joining or continuing conversations. In this paper, we use data consisting of comments with the indices of toxic text labelled to train an RNN to deter-mine which parts of the comments make them toxic, which could aid online moderators. We compare results using both the original dataset and an augmented set, as well as GRU versus LSTM RNN models.

حوار SRPOL. صناعة حمض الفوسفور

SRPOL DIALOGUE SYSTEMS at SemEval-2021 Task 5: Automatic Generation of Training Data for Toxic Spans Detection

354 - Association for Computation Linguistics 2021 مقالة

This paper presents a system used for SemEval-2021 Task 5: Toxic Spans Detection. Our system is an ensemble of BERT-based models for binary word classification, trained on a dataset extended by toxic comments modified and generated by two language mo dels. For the toxic word classification, the prediction threshold value was optimized separately for every comment, in order to maximize the expected F1 value.

srpol dialogue systems srpol dialogue نظم حوار SRPOL حوار SRPOL. صناعة حمض الفوسفور

BennettNLP at SemEval-2021 Task 5: Toxic Spans Detection using Stacked Embedding Powered Toxic Entity Recognizer

377 - Association for Computation Linguistics 2021 مقالة

With the rapid growth in technology, social media activity has seen a boom across all age groups. It is humanly impossible to check all the tweets, comments and status manually whether they follow proper community guidelines. A lot of toxicity is reg ularly posted on these social media platforms. This research aims to find toxic words in a sentence so that a healthy social community is built across the globe and the users receive censored content with specific warnings and facts. To solve this challenging problem, authors have combined concepts of Linked List for pre-processing and then used the idea of stacked embeddings like BERT Embeddings, Flair Embeddings and Word2Vec on the flairNLP framework to get the desired results. F1 metric was used to evaluate the model. The authors were able to produce a 0.74 F1 score on their test set.

toxic entity recognizer powered toxic entity الكيان السامي المعترف به كيان سام مدعوم صناعة حمض الفوسفور

LZ1904 at SemEval-2021 Task 5: Bi-LSTM-CRF for Toxic Span Detection using Pretrained Word Embedding

285 - Association for Computation Linguistics 2021 مقالة

Recurrent Neural Networks (RNN) have been widely used in various Natural Language Processing (NLP) tasks such as text classification, sequence tagging, and machine translation. Long Short Term Memory (LSTM), a special unit of RNN, has the benefit of memorizing past and even future information in a sentence (especially for bidirectional LSTM). In the shared task of detecting spans which make texts toxic, we first apply pretrained word embedding (GloVe) to generate the word vectors after tokenization. And then we construct Bidirectional Long Short Term Memory-Conditional Random Field (Bi-LSTM-CRF) model by Baidu research to predict whether each word in the sentence is toxic or not. We tune hyperparameters of dropout rate, number of LSTM units, embedding size with 10 epochs and choose the best epoch with validation recall. Our model achieves an F1 score of 66.99 percent in test dataset.

حوار SRPOL. long short term طويل الأجل طويل صناعة حمض الفوسفور

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

UTNLP at SemEval-2021 Task 5: A Comparative Analysis of Toxic Span Detection using Attention-based, Named Entity Recognition, and Ensemble Models

UTNLP في مهمة Semeval-2021: تحليل مقارن للكشف عن المكشوف السامة باستخدام الاهتمام، والاعتراف الكياني المسمى، ونماذج الفرقة

Ask ChatGPT about the research

Read More

suggested questions