لا يزال الاستحواذ على بيانات التدريب المتعدد اللغات يمثل تحديا في غزالة معنى الكلمة (WSD).لمعالجة هذه المشكلة، اقترحت النهج غير الخاضعة للكالة لإنشاء التعليقات التوضيحية بالمعنى تلقائيا لتدريب أنظمة WSD الخاضعة للإشراف.نقدم ثلاث طرق جديدة لإنشاء كوربورا المعشوفة بالشعور التي تستفيد الترجمات، وثبات الموازية، والموارد المعجمية، وكذلك تضمينات السياق والتركيب.تطبق أسلوبنا شبه الإشراف ترجمة الجهاز لنقل التعليقات التوضيحية القائمة إلى لغات أخرى.طرقنا اثنين من الأساليب غير الخاضعة لعمليات إعادة صياغة الشرح بالمعنى الناتج عن نظام WSD القائم على المعرفة عبر الترجمات المعجمية في كوربوس متوازي.نحصل على نتائج حديثة على معايير WSD القياسية.
Acquisition of multilingual training data continues to be a challenge in word sense disambiguation (WSD). To address this problem, unsupervised approaches have been proposed to automatically generate sense annotations for training supervised WSD systems. We present three new methods for creating sense-annotated corpora which leverage translations, parallel bitexts, lexical resources, as well as contextual and synset embeddings. Our semi-supervised method applies machine translation to transfer existing sense annotations to other languages. Our two unsupervised methods refine sense annotations produced by a knowledge-based WSD system via lexical translations in a parallel corpus. We obtain state-of-the-art results on standard WSD benchmarks.
References used
https://aclanthology.org/
To alleviate human efforts from obtaining large-scale annotations, Semi-Supervised Relation Extraction methods aim to leverage unlabeled data in addition to learning from limited samples. Existing self-training methods suffer from the gradual drift p
We describe the University of Alberta systems for the SemEval-2021 Word-in-Context (WiC) disambiguation task. We explore the use of translation information for deciding whether two different tokens of the same word correspond to the same sense of the
This paper presents a production Semi-Supervised Learning (SSL) pipeline based on the student-teacher framework, which leverages millions of unlabeled examples to improve Natural Language Understanding (NLU) tasks. We investigate two questions relate
In this study, we proposed a novel Lexicon-based pseudo-labeling method utilizing explainable AI(XAI) approach. Existing approach have a fundamental limitation in their robustness because poor classifier leads to inaccurate soft-labeling, and it lead
Weakly-supervised text classification aims to induce text classifiers from only a few user-provided seed words. The vast majority of previous work assumes high-quality seed words are given. However, the expert-annotated seed words are sometimes non-t