The absence of diacritization in Arabic texts is one of the most important challenges facing the
automatic Arabic Language processing. When reading, Arabic reader can expect the correct
diacritics of words, while computers need algorithms to restor
e the diacritization based on
knowledge of different levels. Diacritization here includes all the diacritics (dama, fatha, kasra,
sokon), in addition to alshadda, and altanween.
Some diacritization methods are based on the linguistic processing of texts, while other
methods are based on statistical methods using textual corpus. Some systems integrate the two
methodologies in hybrid approaches.
In this paper we present a comprehensive study of different methods that have been adopted in
these diacritization systems. In addition, we review the various corpuses that have been used
for tests and evaluation, then suggest the specifications of the Arabic corpus needed for
diacritization systems, and the standards that the evaluation process must take into
consideration. The main objective is to develop an action plan for the construction of an
automatic diacritizer of Arabic texts under the auspices of ALECSO, with the participation of
many research entities from different countries.
This paper presents ArOntoLearn, a Framework for Arabic Ontology learning from textual resources.
Supporting Arabic language and using domain knowledge in the learning process are the main features of
our framework. Besides it represents the learne
d ontology in Probabilistic Ontology Model (POM), which
can be translated into any knowledge representation formalism, and implements data-driven change
discovery. Therefore it updates the POM according to the corpus changes only, and allows user to trace
the evolution of the ontology with respect to the changes in the underlying corpus. Our framework
analyses Arabic textual resources, and matches them to Arabic Lexico-syntactic patterns in order to learn
new Concepts and Relations.
Supporting Arabic language is not that easy task, because current linguistic analysis tools are not efficient
enough to process unvocalized Arabic corpuses that rarely contain appropriate punctuation. So we tried
to build a flexible and freely configured framework whereas any linguistic analysis tool can be replaced by
more sophisticated one whenever it is available.
In this paper we present a web-based Interactive Arabic Dictionary developed in HIAST (Higher Institute
for Applied Sciences and Technology). Users can search online any Arabic word. The system provides
different meanings with example sentences and
multimedia illustrations, in addition to other related
information like associated words, semantic domains, expressions, linguistic avails, common mistakes, and
morphologic, syntactic and semantic information. The dictionary can be enriched collaboratively by
expert users with new words, new meanings for available entries, or other morphological, syntactic, and
semantic related information.
Morphological analysis is an important step in natural language processing and its
various applications. Each kind of these applications needs a certain balance between:
performance, accuracy, and generality of solutions (i.e. getting all possible
roots); while
we focus on performance with a good accuracy in Information retrieval applications,
we try to achieve high accuracy in systems like pos-tagger and machine translation, and
both high accuracy and high generality in systems like language learning systems and
Arabic lexical dictionaries. In this paper, we describe our approach to build a flexible
and application oriented Arabic morphological analyzer; this approach is designed to
satisfy various requirements of most applications which need morphological processing.
It also provides a separate stage (Original Letters Detection Algorithm) which can be
plugged easily in any Other morphological analyzer to improve its performance, and
with no negative effect on its reliability.
This research is one stage of the construction of an Arabic speech synthesis
system, which is “text-to-phonemes transliteration”.
A complete text-to-phonemes transliteration system has been built for
Arabic language.
In this system we used TOPH (
Orthographic-Phonetic Transcription)
method, used for transliterating the French language, to perform the
transliteration from text to phonemes in Arabic. We also wrote the Arabic textto-
phonemes rules in TOPH formal language.
In the present work, we present our Arabic Semi-Syllable Synthesizer. The work consists of seven steps: (1) building a Semi-Syllable Speech Database for Arabic Semi-Syllable Synthesizer, (2) building the Natural Language Processing Module which compr
ises a Text Pre-processing Module and a Text to Phoneme conversion using Arabic Transcription from Orthographic to Phonemes, (3) followed by a Phoneme to Semi-Syllables Mapping using a Syllabification Expert System, (4) an Acoustic Word Stress Analysis for Continuous Arabic Speech based on the three prosodic parameters (fundamental frequency, intensity, duration) in order to detect stressed syllables.
In our work, we chose to follow semantic transfer based approach. Our approach consists of two main phases. The first phase, Natural Language Analysis phase, aims to analyze the text and extract the required knowledge from it. In addition to the synt
actic analysis results, one of the main outputs for this phase is a concept map which summarize the concepts of the related domain and the relationships between these concepts.