New community

Subscribe to the gold package and get unlimited access to Shamra Academy

SINAI at SemEval-2021 Task 5: Combining Embeddings in a BiLSTM-CRF model for Toxic Spans Detection

سيناء في مهمة Semeval-2021 5: الجمع بين المدينين في نموذج Bilstm-CRF للكشف عن الأمور السامة

290 0 0 0.0 ( 0 )

Download Cite

Added by Association for Computation Linguistics مقالة

Publication date 2021

fields Artificial Intelligence

and research's language is English

Created by Shamra Editor

حوار SRPOL. صناعة حمض الفوسفور

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

This paper describes the participation of SINAI team at Task 5: Toxic Spans Detection which consists of identifying spans that make a text toxic. Although several resources and systems have been developed so far in the context of offensive language, both annotation and tasks have mainly focused on classifying whether a text is offensive or not. However, detecting toxic spans is crucial to identify why a text is toxic and can assist human moderators to locate this type of content on social media. In order to accomplish the task, we follow a deep learning-based approach using a Bidirectional variant of a Long Short Term Memory network along with a stacked Conditional Random Field decoding layer (BiLSTM-CRF). Specifically, we test the performance of the combination of different pre-trained word embeddings for recognizing toxic entities in text. The results show that the combination of word embeddings helps in detecting offensive content. Our team ranks 29th out of 91 participants.

References used

https://aclanthology.org/

rate research

SRPOL DIALOGUE SYSTEMS at SemEval-2021 Task 5: Automatic Generation of Training Data for Toxic Spans Detection

354 - Association for Computation Linguistics 2021 مقالة

This paper presents a system used for SemEval-2021 Task 5: Toxic Spans Detection. Our system is an ensemble of BERT-based models for binary word classification, trained on a dataset extended by toxic comments modified and generated by two language mo dels. For the toxic word classification, the prediction threshold value was optimized separately for every comment, in order to maximize the expected F1 value.

srpol dialogue systems srpol dialogue نظم حوار SRPOL حوار SRPOL. صناعة حمض الفوسفور

macech at SemEval-2021 Task 5: Toxic Spans Detection

276 - Association for Computation Linguistics 2021 مقالة

Toxic language is often present in online forums, especially when politics and other polarizing topics arise, and can lead to people becoming discouraged from joining or continuing conversations. In this paper, we use data consisting of comments with the indices of toxic text labelled to train an RNN to deter-mine which parts of the comments make them toxic, which could aid online moderators. We compare results using both the original dataset and an augmented set, as well as GRU versus LSTM RNN models.

حوار SRPOL. صناعة حمض الفوسفور

HLE-UPC at SemEval-2021 Task 5: Multi-Depth DistilBERT for Toxic Spans Detection

458 - Association for Computation Linguistics 2021 مقالة

This paper presents our submission to SemEval-2021 Task 5: Toxic Spans Detection. The purpose of this task is to detect the spans that make a text toxic, which is a complex labour for several reasons. Firstly, because of the intrinsic subjectivity of toxicity, and secondly, due to toxicity not always coming from single words like insults or offends, but sometimes from whole expressions formed by words that may not be toxic individually. Following this idea of focusing on both single words and multi-word expressions, we study the impact of using a multi-depth DistilBERT model, which uses embeddings from different layers to estimate the final per-token toxicity. Our quantitative results show that using information from multiple depths boosts the performance of the model. Finally, we also analyze our best model qualitatively.

انتباه مقرها صناعة حمض الفوسفور

UTNLP at SemEval-2021 Task 5: A Comparative Analysis of Toxic Span Detection using Attention-based, Named Entity Recognition, and Ensemble Models

307 - Association for Computation Linguistics 2021 مقالة

Detecting which parts of a sentence contribute to that sentence's toxicity---rather than providing a sentence-level verdict of hatefulness--- would increase the interpretability of models and allow human moderators to better understand the outputs of the system. This paper presents our team's, UTNLP, methodology and results in the SemEval-2021 shared task 5 on toxic spans detection. We test multiple models and contextual embeddings and report the best setting out of all. The experiments start with keyword-based models and are followed by attention-based, named entity- based, transformers-based, and ensemble models. Our best approach, an ensemble model, achieves an F1 of 0.684 in the competition's evaluation phase.

حوار SRPOL. صناعة حمض الفوسفور

SemEval-2021 Task 5: Toxic Spans Detection

280 - Association for Computation Linguistics 2021 مقالة

The Toxic Spans Detection task of SemEval-2021 required participants to predict the spans of toxic posts that were responsible for the toxic label of the posts. The task could be addressed as supervised sequence labeling, using training data with gol d toxic spans provided by the organisers. It could also be treated as rationale extraction, using classifiers trained on potentially larger external datasets of posts manually annotated as toxic or not, without toxic span annotations. For the supervised sequence labeling approach and evaluation purposes, posts previously labeled as toxic were crowd-annotated for toxic spans. Participants submitted their predicted spans for a held-out test set and were scored using character-based F1. This overview summarises the work of the 36 teams that provided system descriptions.

toxic spans detection spans detection task spans detection يمتد يمتد السامة يمتد مهمة الكشف عنها يمتد الكشف صناعة حمض الفوسفور المزيد..

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

SINAI at SemEval-2021 Task 5: Combining Embeddings in a BiLSTM-CRF model for Toxic Spans Detection

سيناء في مهمة Semeval-2021 5: الجمع بين المدينين في نموذج Bilstm-CRF للكشف عن الأمور السامة

Ask ChatGPT about the research

Read More

suggested questions