إن التنبؤ بمستوى تعقيد كلمة أو عبارة تعتبر مهمة صعبة.يتم التعرف عليه حتى كخطوة حاسمة في العديد من تطبيقات NLP، مثل إعادة ترتيب النصوص ومبسط النص.تعامل البحث المبكر المهمة بمثابة مهمة تصنيف ثنائية، حيث توقعت النظم وجود تعقيد كلمة (معقد مقابل غير معقدة).تم تصميم دراسات أخرى لتقييم مستوى تعقيد الكلمات باستخدام نماذج الانحدار أو نماذج تصنيف الوسائط المتعددة.تظهر نماذج التعلم العميق تحسنا كبيرا على نماذج تعلم الآلات مع صعود تعلم التحويل ونماذج اللغة المدربة مسبقا.تقدم هذه الورقة نهجنا الذي فاز في المرتبة الأولى في المهمة السامية 1 (Sub STASK1).لقد حسبنا درجة تعقيد كلمة من 0-1 داخل النص.لقد تم تصنيفنا في المرتبة الأولى في المسابقة باستخدام نماذج اللغة المدربة مسبقا بيرت روبرتا، مع درجة ارتباط بيرسون من 0.788.
Predicting the complexity level of a word or a phrase is considered a challenging task. It is even recognized as a crucial step in numerous NLP applications, such as text rearrangements and text simplification. Early research treated the task as a binary classification task, where the systems anticipated the existence of a word's complexity (complex versus uncomplicated). Other studies had been designed to assess the level of word complexity using regression models or multi-labeling classification models. Deep learning models show a significant improvement over machine learning models with the rise of transfer learning and pre-trained language models. This paper presents our approach that won the first rank in the SemEval-task1 (sub stask1). We have calculated the degree of word complexity from 0-1 within a text. We have been ranked first place in the competition using the pre-trained language models Bert and RoBERTa, with a Pearson correlation score of 0.788.
References used
https://aclanthology.org/
In this paper we describe our participation in the Lexical Complexity Prediction (LCP) shared task of SemEval 2021, which involved predicting subjective ratings of complexity for English single words and multi-word expressions, presented in context.
In this paper, we present three supervised systems for English lexical complexity prediction of single and multiword expressions for SemEval-2021 Task 1. We explore the use of statistical baseline features, masked language models, and character-level
The main contribution of this paper is to fine-tune transformer-based language models pre-trained on several text corpora, some being general (E.g., Wikipedia, BooksCorpus), some being the corpora from which the CompLex Dataset was extracted, and oth
This paper describes a system submitted by team BigGreen to LCP 2021 for predicting the lexical complexity of English words in a given context. We assemble a feature engineering-based model with a deep neural network model founded on BERT. While BERT
Lexical Complexity Prediction (LCP) involves assigning a difficulty score to a particular word or expression, in a text intended for a target audience. In this paper, we introduce a new deep learning-based system for this challenging task. The propos