غالبا ما تستخدم Lemmatization من اللغات الغنية المورفولوجية لمعالجة القضايا الناجمة عن التعقيد المورفولوجي، التي أجريتها Lemmatizers القائم على القواعد.نقترح بديلا لهذا، في شكل أداة تقوم بتنفيذ Lemmatization في مساحة Word Embeddings.تضيء كلمة كتمثيل موزز أصلي بعض المعلومات حول العلاقة بين الأساس والنماذج المؤذية، وإظهار أنه من الممكن تعلم التحول الذي يوصي ما يقرب من تضييق أشرطة النماذج التي تم تأصيلها إلى Admass of the المقابلة.يؤدي هذا إلى تسهيل خط أنابيب معالجة بديل يحل محل الليمات التقليدية مع التحول الليمون في معالجة المصب لأي تطبيق.نوضح الطريقة في اللغة الفنلندية، مما يتفوق على Lemmatizers التقليدية على سبيل المثال مهمة مقارنة تشابه الوثيقة، ولكن النهج مستقلة للغة ويمكن تدريب لغات جديدة مع متطلبات خفيفة.
Lemmatization is often used with morphologically rich languages to address issues caused by morphological complexity, performed by grammar-based lemmatizers. We propose an alternative for this, in form of a tool that performs lemmatization in the space of word embeddings. Word embeddings as distributed representations natively encode some information about the relationship between base and inflected forms, and we show that it is possible to learn a transformation that approximately maps the embeddings of inflected forms to the embeddings of the corresponding lemmas. This facilitates an alternative processing pipeline that replaces traditional lemmatization with the lemmatizing transformation in downstream processing for any application. We demonstrate the method in the Finnish language, outperforming traditional lemmatizers in example task of document similarity comparison, but the approach is language independent and can be trained for new languages with mild requirements.
References used
https://aclanthology.org/
Abstract This study carries out a systematic intrinsic evaluation of the semantic representations learned by state-of-the-art pre-trained multimodal Transformers. These representations are claimed to be task-agnostic and shown to help on many downstr
We introduce a new approach for smoothing and improving the quality of word embeddings. We consider a method of fusing word embeddings that were trained on the same corpus but with different initializations. We project all the models to a shared vect
This paper introduces a novel approach to learn visually grounded meaning representations of words as low-dimensional node embeddings on an underlying graph hierarchy. The lower level of the hierarchy models modality-specific word representations, co
Existing supervised models for text clustering find it difficult to directly optimize for clustering results. This is because clustering is a discrete process and it is difficult to estimate meaningful gradient of any discrete function that can drive
How do people understand the meaning of the word small'' when used to describe a mosquito, a church, or a planet? While humans have a remarkable ability to form meanings by combining existing concepts, modeling this process is challenging. This paper