Exploring Arabic text diacritization approaches in view of establishing an action plan for developing an open source diacritizer


Abstract in English

The absence of diacritization in Arabic texts is one of the most important challenges facing the automatic Arabic Language processing. When reading, Arabic reader can expect the correct diacritics of words, while computers need algorithms to restore the diacritization based on knowledge of different levels. Diacritization here includes all the diacritics (dama, fatha, kasra, sokon), in addition to alshadda, and altanween. Some diacritization methods are based on the linguistic processing of texts, while other methods are based on statistical methods using textual corpus. Some systems integrate the two methodologies in hybrid approaches. In this paper we present a comprehensive study of different methods that have been adopted in these diacritization systems. In addition, we review the various corpuses that have been used for tests and evaluation, then suggest the specifications of the Arabic corpus needed for diacritization systems, and the standards that the evaluation process must take into consideration. The main objective is to develop an action plan for the construction of an automatic diacritizer of Arabic texts under the auspices of ALECSO, with the participation of many research entities from different countries.

References used

N. Habash, O. Rambow, 2007, "Arabic Diacritization through Full Morphological Tagging", Proceedings of 8th Meeting of the North American Chapter of the Association for Computational Linguistics; Human Language Technologies Conference
M. Rashwan, M. Al-Badrashiny, M. Attia and S. M. Abdou, 2009, "A Hybrid System for Automatic Arabic Diacritization", Proceedings of the 2nd International Conference on Arabic Language Resources and Tools, Cairo, Egypt, April 2009
M. Maamouri, A. Bies, and T. Buckwalter. 2004. The Penn Arabic Treebank: Building a large-scale annotated arabic corpus. In NEMLAR Conference on Arabic Language Resources and Tools, Cairo, Egypt

Download