Do you want to publish a course? Click here

Pre-training Transformer-based models such as BERT and ELECTRA on a collection of Arabic corpora, demonstrated by both AraBERT and AraELECTRA, shows an impressive result on downstream tasks. However, pre-training Transformer-based language models is computationally expensive, especially for large-scale models. Recently, Funnel Transformer has addressed the sequential redundancy inside Transformer architecture by compressing the sequence of hidden states, leading to a significant reduction in the pre-training cost. This paper empirically studies the performance and efficiency of building an Arabic language model with Funnel Transformer and ELECTRA objective. We find that our model achieves state-of-the-art results on several Arabic downstream tasks despite using less computational resources compared to other BERT-based models.
The emergence of Multi-task learning (MTL)models in recent years has helped push thestate of the art in Natural Language Un-derstanding (NLU). We strongly believe thatmany NLU problems in Arabic are especiallypoised to reap the benefits of such model s. Tothis end we propose the Arabic Language Un-derstanding Evaluation Benchmark (ALUE),based on 8 carefully selected and previouslypublished tasks. For five of these, we providenew privately held evaluation datasets to en-sure the fairness and validity of our benchmark.We also provide a diagnostic dataset to helpresearchers probe the inner workings of theirmodels.Our initial experiments show thatMTL models outperform their singly trainedcounterparts on most tasks. But in order to en-tice participation from the wider community,we stick to publishing singly trained baselinesonly. Nonetheless, our analysis reveals thatthere is plenty of room for improvement inArabic NLU. We hope that ALUE will playa part in helping our community realize someof these improvements. Interested researchersare invited to submit their results to our online,and publicly accessible leaderboard.
Advances in English language representation enabled a more sample-efficient pre-training task by Efficiently Learning an Encoder that Classifies Token Replacements Accurately (ELECTRA). Which, instead of training a model to recover masked tokens, it trains a discriminator model to distinguish true input tokens from corrupted tokens that were replaced by a generator network. On the other hand, current Arabic language representation approaches rely only on pretraining via masked language modeling. In this paper, we develop an Arabic language representation model, which we name AraELECTRA. Our model is pretrained using the replaced token detection objective on large Arabic text corpora. We evaluate our model on multiple Arabic NLP tasks, including reading comprehension, sentiment analysis, and named-entity recognition and we show that AraELECTRA outperforms current state-of-the-art Arabic language representation models, given the same pretraining data and with even a smaller model size.
Enabling empathetic behavior in Arabic dialogue agents is an important aspect of building human-like conversational models. While Arabic Natural Language Processing has seen significant advances in Natural Language Understanding (NLU) with language m odels such as AraBERT, Natural Language Generation (NLG) remains a challenge. The shortcomings of NLG encoder-decoder models are primarily due to the lack of Arabic datasets suitable to train NLG models such as conversational agents. To overcome this issue, we propose a transformer-based encoder-decoder initialized with AraBERT parameters. By initializing the weights of the encoder and decoder with AraBERT pre-trained weights, our model was able to leverage knowledge transfer and boost performance in response generation. To enable empathy in our conversational model, we train it using the ArabicEmpatheticDialogues dataset and achieve high performance in empathetic response generation. Specifically, our model achieved a low perplexity value of 17.0 and an increase in 5 BLEU points compared to the previous state-of-the-art model. Also, our proposed model was rated highly by 85 human evaluators, validating its high capability in exhibiting empathy while generating relevant and fluent responses in open-domain settings.
The present study aimed to detect the degree of exercise The Arabic language teachers for creative thinking skills in the Directorate of Education for the North Eastern Badia region. The study's sample consisted of (200) The Arabic language teacher s for sixth and seven grades. To achieve the objectives of the study, the researcher used a questionnaire composed of (63) items. The results of the study showed that the degree of exercise The Arabic language teachers for creative thinking skills development of the student was moderate on the instrument total score, and in the fields of freedom of expression, the positive perspective towards creativity, teaching methods, methods of evaluation, the class environment, and creativity stimulation. Results of the study also pointed to the lack of a statistically significant degree in exercise The Arabic language teachers in the Directorate of Education for the North Eastern Badia region for creative thinking skills development differences depending on the variable: gender, experience, and qualifications of all fields of study. Accordingly, the study concluded that a number of recommendations related.
The study aimed at investigating linguistic performances of the teachers of Arabic language and their relation to their attitudes towards teaching. The sample of the study consisted of 40 Arabic teachers from the public schools in the Northeastern Badia Directorate of Education. To achieve the purpose of study, analytical descriptive approach was used. The instruments of the study were a note card, and a measure of trends towards the teaching. The results of the study showed that the linguistic performances of Arabic teachers and their attitudes toward teaching were medium which indicates a strong correlation between their linguistic performances and their attitudes toward teaching.
This study aimed at analyzing the level of involvement of the linguistic performance in Arabic language curricula, represented by language skills: listening, conversation, reading and writing, in accordance to the outcomes of teaching embodied in t he objectives in order to keep an eye on the appropriateness of the content of the Arabic language curriculum for the predetermined objectives. The results of analysis showed the following: the percentage of representation of the content for listening comprehension for all grades is “80.75”; the percentage of representation of the content for writing skills for all grades is “84.3”; the percentage of representation of the content for conversation skills for all grades is “91.25”; the percentage of representation of the content for reading comprehension for all grades is “92.8”. The results of analysis on the level of curriculum showed that: the percentage of representation of the content for all skills for the first grade is “89.5”; the percentage of representation of the content for all skills for the second grade is “89.125”; the percentage of representation of the content for all skills for the third grade is “87.875”; the percentage of representation of the content for all skills for the fourth grade is “87.375”; the percentage of representation of the content for all skills for the fifth grade is “85.5”; the percentage of representation of the content for all skills for the sixth grade is “84.375”. The study concluded with a number of recommendations.
In this paper, we introduce an algorithm for grouping Arabic documents for building an ontology and its words. We execute the algorithm on five ontologies using Java. We manage the documents by getting 338667 words with its weights corresponding to each ontology. The algorithm had proved its efficiency in optimizing classifiers (SVM, NB) performance, which we tested in this study, comparing with former classifiers results for Arabic language.
The absence of diacritization in Arabic texts is one of the most important challenges facing the automatic Arabic Language processing. When reading, Arabic reader can expect the correct diacritics of words, while computers need algorithms to restor e the diacritization based on knowledge of different levels. Diacritization here includes all the diacritics (dama, fatha, kasra, sokon), in addition to alshadda, and altanween. Some diacritization methods are based on the linguistic processing of texts, while other methods are based on statistical methods using textual corpus. Some systems integrate the two methodologies in hybrid approaches. In this paper we present a comprehensive study of different methods that have been adopted in these diacritization systems. In addition, we review the various corpuses that have been used for tests and evaluation, then suggest the specifications of the Arabic corpus needed for diacritization systems, and the standards that the evaluation process must take into consideration. The main objective is to develop an action plan for the construction of an automatic diacritizer of Arabic texts under the auspices of ALECSO, with the participation of many research entities from different countries.
In this paper we present a web-based Interactive Arabic Dictionary developed in HIAST (Higher Institute for Applied Sciences and Technology). Users can search online any Arabic word. The system provides different meanings with example sentences and multimedia illustrations, in addition to other related information like associated words, semantic domains, expressions, linguistic avails, common mistakes, and morphologic, syntactic and semantic information. The dictionary can be enriched collaboratively by expert users with new words, new meanings for available entries, or other morphological, syntactic, and semantic related information.
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا