الهدف من التنبؤ بمفردات المخزون هو التنبؤ بمفردات متعلم كله بناء على عينة محدودة من كلمات الاستعلام.تقترب هذه الورقة عن المشكلة بدءا من نموذج نظرية استجابة البند 2-المعلمة (IRT)، مما يمنح كل كلمة في المفردات معلمة صعوبة والتمييز.يتم تقييم المعلمة التمييز على مشكلة اختيار البند المسئولية الفرعية، مألوفة من مجالات الاختبار التكيفي المحوسب (القط) والتعلم النشط.بعد ذلك، يتم فحص تأثير المعلمة التمييز على أداء التنبؤ، سواء في إعداد تصنيف ثنائي، وفي بيئة استرجاع المعلومات.يتم مقارنة الأداء مع خط الأساس بناء على تردد Word.يتم فحص عدد من سيناريوهات التعميم المختلفة، بما في ذلك صعوبة الكلمة المعممة والتمييز باستخدام Adgeddings Word مع شبكة مؤشر واختبار بيانات خارج مجموعة البيانات.
The aim of vocabulary inventory prediction is to predict a learner's whole vocabulary based on a limited sample of query words. This paper approaches the problem starting from the 2-parameter Item Response Theory (IRT) model, giving each word in the vocabulary a difficulty and discrimination parameter. The discrimination parameter is evaluated on the sub-problem of question item selection, familiar from the fields of Computerised Adaptive Testing (CAT) and active learning. Next, the effect of the discrimination parameter on prediction performance is examined, both in a binary classification setting, and in an information retrieval setting. Performance is compared with baselines based on word frequency. A number of different generalisation scenarios are examined, including generalising word difficulty and discrimination using word embeddings with a predictor network and testing on out-of-dataset data.
References used
https://aclanthology.org/
During the fine-tuning phase of transfer learning, the pretrained vocabulary remains unchanged, while model parameters are updated. The vocabulary generated based on the pretrained data is suboptimal for downstream data when domain discrepancy exists
We propose a straightforward vocabulary adaptation scheme to extend the language capacity of multilingual machine translation models, paving the way towards efficient continual learning for multilingual machine translation. Our approach is suitable f
This paper describes our system for Task 4 of SemEval-2021: Reading Comprehension of Abstract Meaning (ReCAM). We participated in all subtasks where the main goal was to predict an abstract word missing from a statement. We fine-tuned the pre-trained
Transformer-based models have become the de facto standard in the field of Natural Language Processing (NLP). By leveraging large unlabeled text corpora, they enable efficient transfer learning leading to state-of-the-art results on numerous NLP task
There is an emerging interest in the application of natural language processing models to source code processing tasks. One of the major problems in applying deep learning to software engineering is that source code often contains a lot of rare ident