ترغب بنشر مسار تعليمي؟ اضغط هنا

Arabic is a semitic language characterized by a complex and rich morphology. The exceptional degree of ambiguity in the writing system, the rich morphology, and the highly complex word formation process of roots and patterns all contribute to making computational approaches to Arabic very challenging. As a result, a practical handwriting recognition system should support large vocabulary to provide a high coverage and use the context information for disambiguation. Several research efforts have been devoted for building online Arabic handwriting recognition systems. Most of these methods are either using their small private test data sets or a standard database with limited lexicon and coverage. A large scale handwriting database is an essential resource that can advance the research of online handwriting recognition. Currently, there is no online Arabic handwriting database with large lexicon, high coverage, large number of writers and training/testing data. In this paper, we introduce AltecOnDB, a large scale online Arabic handwriting database. AltecOnDB has 98% coverage of all the possible PAWS of the Arabic language. The collected samples are complete sentences that include digits and punctuation marks. The collected data is available on sentence, word and character levels, hence, high-level linguistic models can be used for performance improvements. Data is collected from more than 1000 writers with different backgrounds, genders and ages. Annotation and verification tools are developed to facilitate the annotation and verification phases. We built an elementary recognition system to test our database and show the existing difficulties when handling a large vocabulary and dealing with large amounts of styles variations in the collected data.
Arabic handwriting is a consonantal and cursive writing. The analysis of Arabic script is further complicated due to obligatory dots/strokes that are placed above or below most letters and usually written delayed in order. Due to ambiguities and dive rsities of writing styles, recognition systems are generally based on a set of possible words called lexicon. When the lexicon is small, recognition accuracy is more important as the recognition time is minimal. On the other hand, recognition speed as well as the accuracy are both critical when handling large lexicons. Arabic is rich in morphology and syntax which makes its lexicon large. Therefore, a practical online handwriting recognition system should be able to handle a large lexicon with reasonable performance in terms of both accuracy and time. In this paper, we introduce a fully-fledged Hidden Markov Model (HMM) based system for Arabic online handwriting recognition that provides solutions for most of the difficulties inherent in recognizing the Arabic script. A new preprocessing technique for handling the delayed strokes is introduced. We use advanced modeling techniques for building our recognition system from the training data to provide more detailed representation for the differences between the writing units, minimize the variances between writers in the training data and have a better representation for the features space. System results are enhanced using an additional post-processing step with a higher order language model and cross-word HMM models. The system performance is evaluated using two different databases covering small and large lexicons. Our system outperforms the state-of-art systems for the small lexicon database. Furthermore, it shows promising results (accuracy and time) when supporting large lexicon with the possibility for adapting the models for specific writers to get even better results.
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا