ﻻ يوجد ملخص باللغة العربية
In this article, we present a rule-based approach for transliterating two mostly used orthographies in Sorani Kurdish. Our work consists of detecting a character in a word by removing the possible ambiguities and mapping it into the target orthography. We describe different challenges in Kurdish text mining and propose novel ideas concerning the transliteration task for Sorani Kurdish. Our transliteration system, named Wergor, achieves 82.79% overall precision and more than 99% in detecting the double-usage characters. We also present a manually transliterated corpus for Kurdish.
We present an experimental dataset, Basic Dataset for Sorani Kurdish Automatic Speech Recognition (BD-4SK-ASR), which we used in the first attempt in developing an automatic speech recognition for Sorani Kurdish. The objective of the project was to d
Sequence-to-Sequence (S2S) neural text generation models, especially the pre-trained ones (e.g., BART and T5), have exhibited compelling performance on various natural language generation tasks. However, the black-box nature of these models limits th
Clinical text classification is an important problem in medical natural language processing. Existing studies have conventionally focused on rules or knowledge sources-based feature engineering, but only a few have exploited effective feature learnin
Sorani Kurdish, also known as Central Kurdish, has a complex morphology, particularly due to the patterns in which morphemes appear. Although several aspects of Kurdish morphology have been studied, such as pronominal endoclitics and Izafa constructi
Its natural these days for people to know the local events from massive documents. Many texts contain location information, such as city name or road name, which is always incomplete or latent. Its significant to extract the administrative area of th