ترغب بنشر مسار تعليمي؟ اضغط هنا

Towards Finite-State Morphology of Kurdish

65   0   0.0 ( 0 )
 نشر من قبل Sina Ahmadi
 تاريخ النشر 2020
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

Morphological analysis is the study of the formation and structure of words. It plays a crucial role in various tasks in Natural Language Processing (NLP) and Computational Linguistics (CL) such as machine translation and text and speech generation. Kurdish is a less-resourced multi-dialect Indo-European language with highly inflectional morphology. In this paper, as the first attempt of its kind, the morphology of the Kurdish language (Sorani dialect) is described from a computational point of view. We extract morphological rules which are transformed into finite-state transducers for generating and analyzing words. The result of this research assists in conducting studies on language generation for Kurdish and enhances the Information Retrieval (IR) capacity for the language while leveraging the Kurdish NLP and CL into a more advanced computational level.



قيم البحث

اقرأ أيضاً

62 - Sina Ahmadi 2021
Sorani Kurdish, also known as Central Kurdish, has a complex morphology, particularly due to the patterns in which morphemes appear. Although several aspects of Kurdish morphology have been studied, such as pronominal endoclitics and Izafa constructi ons, Sorani Kurdish morphology has received trivial attention in computational linguistics. Moreover, some morphemes, such as the emphasis endoclitic =^ic{s}, and derivational morphemes have not been previously studied. To tackle the complex morphology of Sorani, we provide a thorough description of Sorani Kurdish morphological and morphophonological constructions in a formal way such that they can be used as finite-state transducers for morphological analysis and synthesis.
144 - Sina Ahmadi , Mariam Masoud 2020
Machine translation is the task of translating texts from one language to another using computers. It has been one of the major tasks in natural language processing and computational linguistics and has been motivating to facilitate human communicati on. Kurdish, an Indo-European language, has received little attention in this realm due to the language being less-resourced. Therefore, in this paper, we are addressing the main issues in creating a machine translation system for the Kurdish language, with a focus on the Sorani dialect. We describe the available scarce parallel data suitable for training a neural machine translation model for Sorani Kurdish-English translation. We also discuss some of the major challenges in Kurdish language translation and demonstrate how fundamental text processing tasks, such as tokenization, can improve translation performance.
259 - Sina Ahmadi 2018
In this article, we present a rule-based approach for transliterating two mostly used orthographies in Sorani Kurdish. Our work consists of detecting a character in a word by removing the possible ambiguities and mapping it into the target orthograph y. We describe different challenges in Kurdish text mining and propose novel ideas concerning the transliteration task for Sorani Kurdish. Our transliteration system, named Wergor, achieves 82.79% overall precision and more than 99% in detecting the double-usage characters. We also present a manually transliterated corpus for Kurdish.
We present an experimental dataset, Basic Dataset for Sorani Kurdish Automatic Speech Recognition (BD-4SK-ASR), which we used in the first attempt in developing an automatic speech recognition for Sorani Kurdish. The objective of the project was to d evelop a system that automatically could recognize simple sentences based on the vocabulary which is used in grades one to three of the primary schools in the Kurdistan Region of Iraq. We used CMUSphinx as our experimental environment. We developed a dataset to train the system. The dataset is publicly available for non-commercial use under the CC BY-NC-SA 4.0 license.
90 - Sina Ahmadi 2021
Spell checking and morphological analysis are two fundamental tasks in text and natural language processing and are addressed in the early stages of the development of language technology. Despite the previous efforts, there is no progress in open-so urce to create such tools for Sorani Kurdish, also known as Central Kurdish, as a less-resourced language. In this paper, we present our efforts in annotating a lexicon with morphosyntactic tags and also, extracting morphological rules of Sorani Kurdish to build a morphological analyzer, a stemmer and a spell-checking system using Hunspell. This implementation can be used for further developments in the field by researchers and also, be integrated into text editors under a publicly available license.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا