تنقل تنبؤ التعقيد المعجمي (LCP) باحسن مستوى تعقيد رمز رمزي أو مجموعة من الرموز في جملة.يلعب دورا حيويا في تحسين مهام NLP المختلفة بما في ذلك التبسيط المعجمي والترجمات وتوليد النص.ومع ذلك، فإن المعنى المتعدد لكلمة في ظروف متعددة، وهيكل مجمع نحوي، والاعتماد المتبادل للكلمات في جملة تجعل من الصعب تقدير التعقيد المعجمي.لمعالجة هذه التحديات، قدمت مهمة Semeval-2021 1 مهمة مشتركة تركز على LCP وتعرض هذه الورقة مشاركتنا في هذه المهمة.اقترحنا نهجا قائم على المحولات مع انحدار زوج الجملة.نحن عملنا نماذج محول صعبة ضبطها.بما في ذلك بيرت وروبرتا لتدريب نموذجنا وصماماتها المتوقعة لتقدير التعقيد.توضح النتائج التجريبية أن طريقةنا المقترحة تحققت أداء تنافسي مقارنة بنظم المشاركين.
Lexical complexity prediction (LCP) conveys the anticipation of the complexity level of a token or a set of tokens in a sentence. It plays a vital role in the improvement of various NLP tasks including lexical simplification, translations, and text generation. However, multiple meaning of a word in multiple circumstances, grammatical complex structure, and the mutual dependency of words in a sentence make it difficult to estimate the lexical complexity. To address these challenges, SemEval-2021 Task 1 introduced a shared task focusing on LCP and this paper presents our participation in this task. We proposed a transformer-based approach with sentence pair regression. We employed two fine-tuned transformer models. Including BERT and RoBERTa to train our model and fuse their predicted score to the complexity estimation. Experimental results demonstrate that our proposed method achieved competitive performance compared to the participants' systems.
References used
https://aclanthology.org/
In this paper we propose a contextual attention based model with two-stage fine-tune training using RoBERTa. First, we perform the first-stage fine-tune on corpus with RoBERTa, so that the model can learn some prior domain knowledge. Then we get the
This paper describes a system submitted by team BigGreen to LCP 2021 for predicting the lexical complexity of English words in a given context. We assemble a feature engineering-based model with a deep neural network model founded on BERT. While BERT
This paper describes team LCP-RIT's submission to the SemEval-2021 Task 1: Lexical Complexity Prediction (LCP). The task organizers provided participants with an augmented version of CompLex (Shardlow et al., 2020), an English multi-domain dataset in
In this paper, we propose a method of fusing sentence information and word frequency information for the SemEval 2021 Task 1-Lexical Complexity Prediction (LCP) shared task. In our system, the sentence information comes from the RoBERTa model, and th
We describe the UTFPR systems submitted to the Lexical Complexity Prediction shared task of SemEval 2021. They perform complexity prediction by combining classic features, such as word frequency, n-gram frequency, word length, and number of senses, w