ﻻ يوجد ملخص باللغة العربية
The present paper aims at presenting a lemmatization and a word-level error correction system for Sorani Kurdish. We propose a hybrid approach based on the morphological rules and a n-gram language model. We have called our lemmatization and error correction systems Peyv and R^en^us respectively, which are the first tools presented for Sorani Kurdish to the best of our knowledge. The Peyv lemmatizer has shown 86.7% accuracy. As for R^en^us, using a lexicon, we have obtained 96.4% accuracy while without a lexicon, the correction system has 87% accuracy. As two fundamental text processing tools, these tools can pave the way for further researches on more natural language processing applications for Sorani Kurdish.
Spell checking and morphological analysis are two fundamental tasks in text and natural language processing and are addressed in the early stages of the development of language technology. Despite the previous efforts, there is no progress in open-so
Sorani Kurdish, also known as Central Kurdish, has a complex morphology, particularly due to the patterns in which morphemes appear. Although several aspects of Kurdish morphology have been studied, such as pronominal endoclitics and Izafa constructi
We present an experimental dataset, Basic Dataset for Sorani Kurdish Automatic Speech Recognition (BD-4SK-ASR), which we used in the first attempt in developing an automatic speech recognition for Sorani Kurdish. The objective of the project was to d
Machine translation has been a major motivation of development in natural language processing. Despite the burgeoning achievements in creating more efficient machine translation systems thanks to deep learning methods, parallel corpora have remained
Segmentation is a fundamental step for most Natural Language Processing tasks. The Kurdish language is a multi-dialect, under-resourced language which is written in different scripts. The lack of various segmented corpora is one of the major bottlene