Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Universit\"at Regensburg MaxS at GermEval 2021 Task 1: Synthetic Data in Toxic Comment Classification

Universit \ "في Regensburg Maxs في Germeval 2021 المهمة 1: البيانات الاصطناعية في تصنيف التعليق السام

946 0 0 0.0 ( 0 )

Download Cite

Added by Association for Computation Linguistics مقالة

Publication date 2021

fields Artificial Intelligence

and research's language is English

Created by Shamra Editor

تعليق سام صناعة حمض الفوسفور

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

We report on our submission to Task 1 of the GermEval 2021 challenge -- toxic comment classification. We investigate different ways of bolstering scarce training data to improve off-the-shelf model performance on a toxic comment classification task. To help address the limitations of a small dataset, we use data synthetically generated by a German GPT-2 model. The use of synthetic data has only recently been taking off as a possible solution to ad- dressing training data sparseness in NLP, and initial results are promising. However, our model did not see measurable improvement through the use of synthetic data. We discuss possible reasons for this finding and explore future works in the field.

References used

https://aclanthology.org/

rate research

DeTox at GermEval 2021: Toxic Comment Classification

707 - Association for Computation Linguistics 2021 مقالة

In this work, we present our approaches on the toxic comment classification task (subtask 1) of the GermEval 2021 Shared Task. For this binary task, we propose three models: a German BERT transformer model; a multilayer perceptron, which was first tr ained in parallel on textual input and 14 additional linguistic features and then concatenated in an additional layer; and a multilayer perceptron with both feature types as input. We enhanced our pre-trained transformer model by re-training it with over 1 million tweets and fine-tuned it on two additional German datasets of similar tasks. The embeddings of the final fine-tuned German BERT were taken as the textual input features for our neural networks. Our best models on the validation data were both neural networks, however our enhanced German BERT gained with a F1-score = 0.5895 a higher prediction on the test data.

تحكم اللغة المدربة مسبقا comment classification task toxic comment مهام تصنيف التعليق تعليق سام صناعة حمض الفوسفور

IITK@LCP at SemEval-2021 Task 1: Classification for Lexical Complexity Regression Task

948 - Association for Computation Linguistics 2021 مقالة

This paper describes our contribution to SemEval 2021 Task 1 (Shardlow et al., 2021): Lexical Complexity Prediction. In our approach, we leverage the ELECTRA model and attempt to mirror the data annotation scheme. Although the task is a regression ta sk, we show that we can treat it as an aggregation of several classification and regression models. This somewhat counter-intuitive approach achieved an MAE score of 0.0654 for Sub-Task 1 and MAE of 0.0811 on Sub-Task 2. Additionally, we used the concept of weak supervision signals from Gloss-BERT in our work, and it significantly improved the MAE score in Sub-Task 1.

بناء على الديموغرافية lexical complexity regression انحدار التعقيد المعجمي صناعة حمض الفوسفور

archer at SemEval-2021 Task 1: Contextualising Lexical Complexity

702 - Association for Computation Linguistics 2021 مقالة

Evaluating the complexity of a target word in a sentential context is the aim of the Lexical Complexity Prediction task at SemEval-2021. This paper presents the system created to assess single words lexical complexity, combining linguistic and psycho linguistic variables in a set of experiments involving random forest and XGboost regressors. Beyond encoding out-of-context information about the lemma, we implemented features based on pre-trained language models to model the target word's in-context complexity.

contextualising lexical complexity contextualising lexical السياق التعقيد المعجمي السياق المعجمية صناعة حمض الفوسفور

HunterSpeechLab at GermEval 2021: Does Your Comment Claim A Fact? Contextualized Embeddings for German Fact-Claiming Comment Classification

601 - Association for Computation Linguistics 2021 مقالة

In this paper we investigate the efficacy of using contextual embeddings from multilingual BERT and German BERT in identifying fact-claiming comments in German on social media. Additionally, we examine the impact of formulating the classification pro blem as a multi-task learning problem, where the model identifies toxicity and engagement of the comment in addition to identifying whether it is fact-claiming. We provide a thorough comparison of the two BERT based models compared with a logistic regression baseline and show that German BERT features trained using a multi-task objective achieves the best F1 score on the test set. This work was done as part of a submission to GermEval 2021 shared task on the identification of fact-claiming comments.

claim a fact comment claim german fact-claiming comment المطالبة حقيقة تعليق المطالبة التعليق الألماني في الواقع صناعة حمض الفوسفور المزيد..

hub at SemEval-2021 Task 5: Toxic Span Detection Based on Word-Level Classification

590 - Association for Computation Linguistics 2021 مقالة

This article introduces the system description of the hub team, which explains the related work and experimental results of our team's participation in SemEval 2021 Task 5: Toxic Spans Detection. The data for this shared task comes from some posts on the Internet. The task goal is to identify the toxic content contained in these text data. We need to find the span of the toxic text in the text data as accurately as possible. In the same post, the toxic text may be one paragraph or multiple paragraphs. Our team uses a classification scheme based on word-level to accomplish this task. The system we used to submit the results is ALBERT+BILSTM+CRF. The result evaluation index of the task submission is the F1 score, and the final score of the prediction result of the test set submitted by our team is 0.6640226029.

نهج مستوى كلمة span detection based اكتشاف تمتد مقرها صناعة حمض الفوسفور

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Universit\"at Regensburg MaxS at GermEval 2021 Task 1: Synthetic Data in Toxic Comment Classification

Universit \ "في Regensburg Maxs في Germeval 2021 المهمة 1: البيانات الاصطناعية في تصنيف التعليق السام

Ask ChatGPT about the research

Read More

suggested questions