تقدم هذه الورقة محاولة في اكتشاف تعبيرات متعددة الكلمات (MWES) في اللغة الفارسية.وهي تركز على استخراج MWES التي تحتوي على لام من مجموعة معينة: الكلمات المستعارة في الفارسية وما يعادلها التي اقترحتها أكاديمية اللغة الفارسية والأدب.من أجل اكتشاف مثل هذه MWES، يتم استخدام أربع تدابير جمعية (AMS) وتقييمها.أخيرا، يتم تحليل قائمة MWES المستخرجة، ويتم عرض مقارنة بين التعبيرات ذات الكلمات المستعارة وما يعادلها.لمعرفةنا، هذه هي المرة الأولى التي يتم فيها توفير مثل هذا التحليل للغة الفارسية.
This paper presents an attempt at multiword expressions (MWEs) discovery in the Persian language. It focuses on extracting MWEs containing lemmas of a particular group: loanwords in Persian and their equivalents proposed by the Academy of Persian Language and Literature. In order to discover such MWEs, four association measures (AMs) are used and evaluated. Finally, the list of extracted MWEs is analyzed, and a comparison between expressions with loanwords and equivalents is presented. To our knowledge, this is the first time such analysis was provided for the Persian language.
References used
https://aclanthology.org/
Chinese character decomposition has been used as a feature to enhance Machine Translation (MT) models, combining radicals into character and word level models. Recent work has investigated ideograph or stroke level embedding. However, questions remai
Supervised approaches usually achieve the best performance in the Word Sense Disambiguation problem. However, the unavailability of large sense annotated corpora for many low-resource languages make these approaches inapplicable for them in practice.
Character-based word-segmentation models have been extensively applied to agglutinative languages, including Thai, due to their high performance. These models estimate word boundaries from a character sequence. However, a character unit in sequences
Word segmentation, the problem of finding word boundaries in speech, is of interest for a range of tasks. Previous papers have suggested that for sequence-to-sequence models trained on tasks such as speech translation or speech recognition, attention
This paper describes systems submitted to Se- mEval 2021 Task 1: Lexical Complexity Prediction (LCP). We compare a linear and a non-linear regression models trained to work for both tracks of the task. We show that both systems are able to generalize