New community

Subscribe to the gold package and get unlimited access to Shamra Academy

TUDA-CCL at SemEval-2021 Task 1: Using Gradient-boosted Regression Tree Ensembles Trained on a Heterogeneous Feature Set for Predicting Lexical Complexity

Tuda-CCL في مهمة Semeval-2021: استخدام مجموعات شجرة الانحدار المزدوجة للتدرج المدربين على ميزة غير متجانسة مجموعة للتنبؤ بالتعقيد المعجمي

80 0 0 0.0 ( 0 )

Download Cite

Added by Association for Computation Linguistics مقالة

Publication date 2021

fields Artificial Intelligence

and research's language is English

Created by Shamra Editor

tree ensembles trained gradient-boosted regression tree heterogeneous feature set مجموعات شجرة المدربين شجرة الانحدار المتدرج مجموعة ميزة غير متجانسة صناعة حمض الفوسفور

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

In this paper, we present our systems submitted to SemEval-2021 Task 1 on lexical complexity prediction.The aim of this shared task was to create systems able to predict the lexical complexity of word tokens and bigram multiword expressions within a given sentence context, a continuous value indicating the difficulty in understanding a respective utterance. Our approach relies on gradient boosted regression tree ensembles fitted using a heterogeneous feature set combining linguistic features, static and contextualized word embeddings, psycholinguistic norm lexica, WordNet, word- and character bigram frequencies and inclusion in wordlists to create a model able to assign a word or multiword expression a context-dependent complexity score. We can show that especially contextualised string embeddings can help with predicting lexical complexity.

References used

https://aclanthology.org/

rate research

IITK@LCP at SemEval-2021 Task 1: Classification for Lexical Complexity Regression Task

261 - Association for Computation Linguistics 2021 مقالة

This paper describes our contribution to SemEval 2021 Task 1 (Shardlow et al., 2021): Lexical Complexity Prediction. In our approach, we leverage the ELECTRA model and attempt to mirror the data annotation scheme. Although the task is a regression ta sk, we show that we can treat it as an aggregation of several classification and regression models. This somewhat counter-intuitive approach achieved an MAE score of 0.0654 for Sub-Task 1 and MAE of 0.0811 on Sub-Task 2. Additionally, we used the concept of weak supervision signals from Gloss-BERT in our work, and it significantly improved the MAE score in Sub-Task 1.

بناء على الديموغرافية lexical complexity regression انحدار التعقيد المعجمي صناعة حمض الفوسفور

C3SL at SemEval-2021 Task 1: Predicting Lexical Complexity of Words in Specific Contexts with Sentence Embeddings

177 - Association for Computation Linguistics 2021 مقالة

We present our approach to predicting lexical complexity of words in specific contexts, as entered LCP Shared Task 1 at SemEval 2021. The approach consists of separating sentences into smaller chunks, embedding them with Sent2Vec, and reducing the em beddings into a simpler vector used as input to a neural network, the latter for predicting the complexity of words and expressions. Results show that the pre-trained sentence embeddings are not able to capture lexical complexity from the language when applied in cross-domain applications.

lcp shared task specific contexts المهمة المشتركة LCP سياقات محددة صناعة حمض الفوسفور

UNBNLP at SemEval-2021 Task 1: Predicting lexical complexity with masked language models and character-level encoders

213 - Association for Computation Linguistics 2021 مقالة

In this paper, we present three supervised systems for English lexical complexity prediction of single and multiword expressions for SemEval-2021 Task 1. We explore the use of statistical baseline features, masked language models, and character-level encoders to predict the complexity of a target token in context. Our best system combines information from these three sources. The results indicate that information from masked language models and character-level encoders can be combined to improve lexical complexity prediction.

predicting lexical complexity predicting lexical masked language models التنبؤ بالتعقيد المعجمي التنبؤ معجم نماذج لغة ملثمنة صناعة حمض الفوسفور المزيد..

LangResearchLab NC at SemEval-2021 Task 1: Linguistic Feature Based Modelling for Lexical Complexity

227 - Association for Computation Linguistics 2021 مقالة

The present work aims at assigning a complexity score between 0 and 1 to a target word or phrase in a given sentence. For each Single Word Target, a Random Forest Regressor is trained on a feature set consisting of lexical, semantic, and syntactic in formation about the target. For each Multiword Target, a set of individual word features is taken along with single word complexities in the feature space. The system yielded the Pearson correlation of 0.7402 and 0.8244 on the test set for the Single and Multiword Targets, respectively.

linguistic feature based feature based modelling based modelling الميزة اللغوية القائمة ميزة القائمة على النمذجة نمذجة مقرها صناعة حمض الفوسفور المزيد..

Cambridge at SemEval-2021 Task 1: An Ensemble of Feature-Based and Neural Models for Lexical Complexity Prediction

135 - Association for Computation Linguistics 2021 مقالة

This paper describes our submission to the SemEval-2021 shared task on Lexical Complexity Prediction. We approached it as a regression problem and present an ensemble combining four systems, one feature-based and three neural with fine-tuning, freque ncy pre-training and multi-task learning, achieving Pearson scores of 0.8264 and 0.7556 on the trial and test sets respectively (sub-task 1). We further present our analysis of the results and discuss our findings.

تعلم نموذج القائم على نموذج مدرب مسبقا صناعة حمض الفوسفور

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

TUDA-CCL at SemEval-2021 Task 1: Using Gradient-boosted Regression Tree Ensembles Trained on a Heterogeneous Feature Set for Predicting Lexical Complexity

Tuda-CCL في مهمة Semeval-2021: استخدام مجموعات شجرة الانحدار المزدوجة للتدرج المدربين على ميزة غير متجانسة مجموعة للتنبؤ بالتعقيد المعجمي

Ask ChatGPT about the research

Read More

suggested questions