Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Precog-LTRC-IIITH at GermEval 2021: Ensembling Pre-Trained Language Models with Feature Engineering

Precog-LTRC-IITH في Germeval 2021: نماذج اللغة المدربة مسبقا مسبقا مع هندسة ميزة

631 0 0 0.0 ( 0 )

Download Cite

Added by Association for Computation Linguistics مقالة

Publication date 2021

fields Artificial Intelligence

and research's language is English

Created by Shamra Editor

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

We describe our participation in all the subtasks of the Germeval 2021 shared task on the identification of Toxic, Engaging, and Fact-Claiming Comments. Our system is an ensemble of state-of-the-art pre-trained models finetuned with carefully engineered features. We show that feature engineering and data augmentation can be helpful when the training data is sparse. We achieve an F1 score of 66.87, 68.93, and 73.91 in Toxic, Engaging, and Fact-Claiming comment identification subtasks.

References used

https://aclanthology.org/

rate research

Team Phoenix at WASSA 2021: Emotion Analysis on News Stories with Pre-Trained Language Models

744 - Association for Computation Linguistics 2021 مقالة

Emotion is fundamental to humanity. The ability to perceive, understand and respond to social interactions in a human-like manner is one of the most desired capabilities in artificial agents, particularly in social-media bots. Over the past few years , computational understanding and detection of emotional aspects in language have been vital in advancing human-computer interaction. The WASSA Shared Task 2021 released a dataset of news-stories across two tracks, Track-1 for Empathy and Distress Prediction and Track-2 for Multi-Dimension Emotion prediction at the essay-level. We describe our system entry for the WASSA 2021 Shared Task (for both Track-1 and Track-2), where we leveraged the information from Pre-trained language models for Track-specific Tasks. Our proposed models achieved an Average Pearson Score of 0.417, and a Macro-F1 Score of 0.502 in Track 1 and Track 2, respectively. In the Shared Task leaderboard, we secured the fourth rank in Track 1 and the second rank in Track 2.

team phoenix emotion analysis wassa shared task فريق فينيكس تحليل العاطفة واساء المهمة المشتركة صناعة حمض الفوسفور المزيد..

Disentangling Semantics and Syntax in Sentence Embeddings with Pre-trained Language Models

975 - Association for Computation Linguistics 2021 مقالة

Pre-trained language models have achieved huge success on a wide range of NLP tasks. However, contextual representations from pre-trained models contain entangled semantic and syntactic information, and therefore cannot be directly used to derive use ful semantic sentence embeddings for some tasks. Paraphrase pairs offer an effective way of learning the distinction between semantics and syntax, as they naturally share semantics and often vary in syntax. In this work, we present ParaBART, a semantic sentence embedding model that learns to disentangle semantics and syntax in sentence embeddings obtained by pre-trained language models. ParaBART is trained to perform syntax-guided paraphrasing, based on a source sentence that shares semantics with the target paraphrase, and a parse tree that specifies the target syntax. In this way, ParaBART learns disentangled semantic and syntactic representations from their respective inputs with separate encoders. Experiments in English show that ParaBART outperforms state-of-the-art sentence embedding models on unsupervised semantic similarity tasks. Additionally, we show that our approach can effectively remove syntactic information from semantic sentence embeddings, leading to better robustness against syntactic variation on downstream semantic tasks.

تدريجي نص قصير صناعة حمض الفوسفور

Span Fine-tuning for Pre-trained Language Models

742 - Association for Computation Linguistics 2021 مقالة

Pre-trained language models (PrLM) have to carefully manage input units when training on a very large text with a vocabulary consisting of millions of words. Previous works have shown that incorporating span-level information over consecutive words i n pre-training could further improve the performance of PrLMs. However, given that span-level clues are introduced and fixed in pre-training, previous methods are time-consuming and lack of flexibility. To alleviate the inconvenience, this paper presents a novel span fine-tuning method for PrLMs, which facilitates the span setting to be adaptively determined by specific downstream tasks during the fine-tuning phase. In detail, any sentences processed by the PrLM will be segmented into multiple spans according to a pre-sampled dictionary. Then the segmentation information will be sent through a hierarchical CNN module together with the representation outputs of the PrLM and ultimately generate a span-enhanced representation. Experiments on GLUE benchmark show that the proposed span fine-tuning method significantly enhances the PrLM, and at the same time, offer more flexibility in an efficient way.

استجابة عاطفية صناعة حمض الفوسفور

ur-iw-hnt at GermEval 2021: An Ensembling Strategy with Multiple BERT Models

687 - Association for Computation Linguistics 2021 مقالة

This paper describes our approach (ur-iw-hnt) for the Shared Task of GermEval2021 to identify toxic, engaging, and fact-claiming comments. We submitted three runs using an ensembling strategy by majority (hard) voting with multiple different BERT mod els of three different types: German-based, Twitter-based, and multilingual models. All ensemble models outperform single models, while BERTweet is the winner of all individual models in every subtask. Twitter-based models perform better than GermanBERT models, and multilingual models perform worse but by a small margin.

multiple bert models ensembling strategy bert models نماذج بيرت متعددة استراتيجية شبه نماذج بيرت صناعة حمض الفوسفور المزيد..

Behavior of Modern Pre-trained Language Models Using the Example of Probing Tasks

725 - Association for Computation Linguistics 2021 مقالة

Modern transformer-based language models are revolutionizing NLP. However, existing studies into language modelling with BERT have been mostly limited to English-language material and do not pay enough attention to the implicit knowledge of language, such as semantic roles, presupposition and negations, that can be acquired by the model during training. Thus, the aim of this study is to examine behavior of the model BERT in the task of masked language modelling and to provide linguistic interpretation to the unexpected effects and errors produced by the model. For this purpose, we used a new Russian-language dataset based on educational texts for learners of Russian and annotated with the help of the National Corpus of the Russian language. In terms of quality metrics (the proportion of words, semantically related to the target word), the multilingual BERT is recognized as the best model. Generally, each model has distinct strengths in relation to a certain linguistic phenomenon. These observations have meaningful implications for research into applied linguistics and pedagogy, contribute to dialogue system development, automatic exercise making, text generation and potentially could improve the quality of existing linguistic technologies

modern pre-trained language modern pre-trained اللغة الحديثة المدربة مسبقا مدرب مسبقا الحديثة صناعة حمض الفوسفور

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Precog-LTRC-IIITH at GermEval 2021: Ensembling Pre-Trained Language Models with Feature Engineering

Precog-LTRC-IITH في Germeval 2021: نماذج اللغة المدربة مسبقا مسبقا مع هندسة ميزة

Ask ChatGPT about the research

Read More

suggested questions