New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Discovery of Multiword Expressions with Loanwords and Their Equivalents in the Persian Language

اكتشاف تعبيرات متعددة الكلمات مع الكلمات المستعارة وما يعادلها في اللغة الفارسية

225 0 0 0.0 ( 0 )

Download Cite

Added by Association for Computation Linguistics مقالة

Publication date 2021

fields Artificial Intelligence

and research's language is English

Created by Shamra Editor

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

This paper presents an attempt at multiword expressions (MWEs) discovery in the Persian language. It focuses on extracting MWEs containing lemmas of a particular group: loanwords in Persian and their equivalents proposed by the Academy of Persian Language and Literature. In order to discover such MWEs, four association measures (AMs) are used and evaluated. Finally, the list of extracted MWEs is analyzed, and a comparison between expressions with loanwords and equivalents is presented. To our knowledge, this is the first time such analysis was provided for the Persian language.

References used

https://aclanthology.org/

rate research

Chinese Character Decomposition for Neural MT with Multi-Word Expressions

294 - Association for Computation Linguistics 2021 مقالة

Chinese character decomposition has been used as a feature to enhance Machine Translation (MT) models, combining radicals into character and word level models. Recent work has investigated ideograph or stroke level embedding. However, questions remai n about different decomposition levels of Chinese character representations, radical and strokes, best suited for MT. To investigate the impact of Chinese decomposition embedding in detail, i.e., radical, stroke, and intermediate levels, and how well these decompositions represent the meaning of the original character sequences, we carry out analysis with both automated and human evaluation of MT. Furthermore, we investigate if the combination of decomposed Multiword Expressions (MWEs) can enhance the model learning. MWE integration into MT has seen more than a decade of exploration. However, decomposed MWEs has not previously been explored.

chinese character decomposition enhance machine translation chinese character التحلل الطابع الصيني تعزيز ترجمة الآلات شخصية صينية صناعة حمض الفوسفور المزيد..

Persian SemCor: A Bag of Word Sense Annotated Corpus for the Persian Language

438 - Association for Computation Linguistics 2021 مقالة

Supervised approaches usually achieve the best performance in the Word Sense Disambiguation problem. However, the unavailability of large sense annotated corpora for many low-resource languages make these approaches inapplicable for them in practice. In this paper, we mitigate this issue for the Persian language by proposing a fully automatic approach for obtaining Persian SemCor (PerSemCor), as a Persian Bag-of-Word (BoW) sense-annotated corpus. We evaluated PerSemCor both intrinsically and extrinsically and showed that it can be effectively used as training sets for Persian supervised WSD systems. To encourage future research on Persian Word Sense Disambiguation, we release the PerSemCor in http://nlp.sbu.ac.ir.

تطابق word sense annotated word sense كلمة معنى المشروح كلمة معنى صناعة حمض الفوسفور

Character-based Thai Word Segmentation with Multiple Attentions

461 - Association for Computation Linguistics 2021 مقالة

Character-based word-segmentation models have been extensively applied to agglutinative languages, including Thai, due to their high performance. These models estimate word boundaries from a character sequence. However, a character unit in sequences has no essential meaning, compared with word, subword, and character cluster units. We propose a Thai word-segmentation model that uses various types of information, including words, subwords, and character clusters, from a character sequence. Our model applies multiple attentions to refine segmentation inferences by estimating the significant relationships among characters and various unit types. The experimental results indicate that our model can outperform other state-of-the-art Thai word-segmentation models.

character-based thai word thai thai word-segmentation الكلمة التايلاندية القائمة على الأحرف التايلاندية التايلاندية تجزئة صناعة حمض الفوسفور المزيد..

On the Difficulty of Segmenting Words with Attention

266 - Association for Computation Linguistics 2021 مقالة

Word segmentation, the problem of finding word boundaries in speech, is of interest for a range of tasks. Previous papers have suggested that for sequence-to-sequence models trained on tasks such as speech translation or speech recognition, attention can be used to locate and segment the words. We show, however, that even on monolingual data this approach is brittle. In our experiments with different input types, data sizes, and segmentation algorithms, only models trained to predict phones from words succeed in the task. Models trained to predict words from either phones or speech (i.e., the opposite direction needed to generalize to new data), yield much worse results, suggesting that attention-based segmentation is only useful in limited scenarios.

difficulty of segmenting segmenting words difficulty صعوبة تجزئة تجزئة الكلمات صعوبة صناعة حمض الفوسفور المزيد..

katildakat at SemEval-2021 Task 1: Lexical Complexity Prediction of Single Words and Multi-Word Expressions in English

240 - Association for Computation Linguistics 2021 مقالة

This paper describes systems submitted to Se- mEval 2021 Task 1: Lexical Complexity Prediction (LCP). We compare a linear and a non-linear regression models trained to work for both tracks of the task. We show that both systems are able to generalize better when supplied with information about complexities of single word and multi-word expression (MWE) targets simultaneously. This approach proved to be the most beneficial for multi-word expression targets. We also demonstrate that some hand-crafted features differ in their importance for the target types.

السياق المعجمية صناعة حمض الفوسفور

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Discovery of Multiword Expressions with Loanwords and Their Equivalents in the Persian Language

اكتشاف تعبيرات متعددة الكلمات مع الكلمات المستعارة وما يعادلها في اللغة الفارسية

Ask ChatGPT about the research

Read More

suggested questions