Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Data Science Kitchen at GermEval 2021: A Fine Selection of Hand-Picked Features, Delivered Fresh from the Oven

111 0 0.0 ( 0 )

Download Cite

Added by Christopher Schymura

Publication date 2021

fields Informatics Engineering

and research's language is English

Authors Niclas Hildebrandt - Benedikt Boenninghoff - Dennis Orth andn Christopher Schymura

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

This paper presents the contribution of the Data Science Kitchen at GermEval 2021 shared task on the identification of toxic, engaging, and fact-claiming comments. The task aims at extending the identification of offensive language, by including additional subtasks that identify comments which should be prioritized for fact-checking by moderators and community managers. Our contribution focuses on a feature-engineering approach with a conventional classification backend. We combine semantic and writing style embeddings derived from pre-trained deep neural networks with additional numerical features, specifically designed for this task. Ensembles of Logistic Regression classifiers and Support Vector Machines are used to derive predictions for each subtask via a majority voting scheme. Our best submission achieved macro-averaged F1-scores of 66.8%, 69.9% and 72.5% for the identification of toxic, engaging, and fact-claiming comments.

rate research

WLV-RIT at GermEval 2021: Multitask Learning with Transformers to Detect Toxic, Engaging, and Fact-Claiming Comments

68 - Skye Morgan , Tharindu Ranasinghe , Marcos Zampieri 2021

This paper addresses the identification of toxic, engaging, and fact-claiming comments on social media. We used the dataset made available by the organizers of the GermEval-2021 shared task containing over 3,000 manually annotated Facebook comments in German. Considering the relatedness of the three tasks, we approached the problem using large pre-trained transformer models and multitask learning. Our results indicate that multitask learning achieves performance superior to the more common single task learning approach in all three tasks. We submit our best systems to GermEval-2021 under the team name WLV-RIT.

Computation and Language Artificial Intelligence Neural and Evolutionary Computing

A Fresh Look at the Hot Hand Paradox

144 - S. Redner 2019

We discuss the hot hand paradox within the framework of the backward Kolmogorov equation. We use this approach to understand the apparently paradoxical features of the statistics of fixed-length sequences of heads and tails upon repeated fair coin flips. In particular, we compute the average waiting time for the appearance of specific sequences. For sequences of length 2, the average time until the appearance of the sequence HH (heads-heads) equals 6, while the waiting time for the sequence HT (heads-tails) equals 4. These results require a few simple calculational steps by the Kolmogorov approach. We also give complete results for sequences of lengths 3, 4, and 5; the extension to longer sequences is straightforward (albeit more tedious). Finally, we compute the waiting times $T_{nrm H}$ for an arbitrary length sequences of all heads and $T_{nrm(HT)}$ for the sequence of alternating heads and tails. For large $n$, $T_{2nrm H}sim 3 T_{nrm(HT)}$.

Popular Physics Statistical Mechanics History and Overview

FH-SWF SG at GermEval 2021: Using Transformer-Based Language Models to Identify Toxic, Engaging, & Fact-Claiming Comments

67 - Christian Gawron , Sebastian Schmidt 2021

In this paper we describe the methods we used for our submissions to the GermEval 2021 shared task on the identification of toxic, engaging, and fact-claiming comments. For all three subtasks we fine-tuned freely available transformer-based models from the Huggingface model hub. We evaluated the performance of various pre-trained models after fine-tuning on 80% of the training data with different hyperparameters and submitted predictions of the two best performing resulting models. We found that this approach worked best for subtask 3, for which we achieved an F1-score of 0.736.

Computation and Language

TransWiC at SemEval-2021 Task 2: Transformer-based Multilingual and Cross-lingual Word-in-Context Disambiguation

72 - Hansi Hettiarachchi , Tharindu Ranasinghe 2021

Identifying whether a word carries the same meaning or different meaning in two contexts is an important research area in natural language processing which plays a significant role in many applications such as question answering, document summarisation, information retrieval and information extraction. Most of the previous work in this area rely on language-specific resources making it difficult to generalise across languages. Considering this limitation, our approach to SemEval-2021 Task 2 is based only on pretrained transformer models and does not use any language-specific processing and resources. Despite that, our best model achieves 0.90 accuracy for English-English subtask which is very compatible compared to the best result of the subtask; 0.93 accuracy. Our approach also achieves satisfactory results in other monolingual and cross-lingual language pairs as well.

Computation and Language Artificial Intelligence Machine Learning

Evaluating the Utility of Hand-crafted Features in Sequence Labelling

70 - Minghao Wu , Fei Liu , Trevor Cohn 2018

Conventional wisdom is that hand-crafted features are redundant for deep learning models, as they already learn adequate representations of text automatically from corpora. In this work, we test this claim by proposing a new method for exploiting handcrafted features as part of a novel hybrid learning approach, incorporating a feature auto-encoder loss component. We evaluate on the task of named entity recognition (NER), where we show that including manual features for part-of-speech, word shapes and gazetteers can improve the performance of a neural CRF model. We obtain a $F_1$ of 91.89 for the CoNLL-2003 English shared task, which significantly outperforms a collection of highly competitive baseline models. We also present an ablation study showing the importance of auto-encoding, over using features as either inputs or outputs alone, and moreover, show including the autoencoder components reduces training requirements to 60%, while retaining the same predictive accuracy.

Computation and Language

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Data Science Kitchen at GermEval 2021: A Fine Selection of Hand-Picked Features, Delivered Fresh from the Oven

Ask ChatGPT about the research

No Arabic abstract

Read More

suggested questions