Do you want to publish a course? Click here

Automatic Difficulty Classification of Arabic Sentences

تصنيف الصعوبة التلقائية للجمل العربية

466   0   0   0.0 ( 0 )
 Publication date 2021
and research's language is English
 Created by Shamra Editor




Ask ChatGPT about the research

In this paper, we present a Modern Standard Arabic (MSA) Sentence difficulty classifier, which predicts the difficulty of sentences for language learners using either the CEFR proficiency levels or the binary classification as simple or complex. We compare the use of sentence embeddings of different kinds (fastText, mBERT , XLM-R and Arabic-BERT), as well as traditional language features such as POS tags, dependency trees, readability scores and frequency lists for language learners. Our best results have been achieved using fined-tuned Arabic-BERT. The accuracy of our 3-way CEFR classification is F-1 of 0.80 and 0.75 for Arabic-Bert and XLM-R classification respectively and 0.71 Spearman correlation for regression. Our binary difficulty classifier reaches F-1 0.94 and F-1 0.98 for sentence-pair semantic similarity classifier.



References used
https://aclanthology.org/
rate research

Read More

اخترنا في هذا المشروع العمل على تطوير نظام يقوم بتصنيف المستندات العربية حسب محتواها, يقوم هذه النظام بالتحليل اللفظي لكلمات المستند ثم إجراء عملية Stemming"رد الأفعال إلى أصلها" ثم تطبيق عملية إحصائية على المستند في مرحلة تدريب النظام ثم بالاعتماد على خوارزميات في الذكاء الصنعي يتم تصنيف المستند حسب محتواه ضمن عناقيد
In this paper, we introduce an algorithm for grouping Arabic documents for building an ontology and its words. We execute the algorithm on five ontologies using Java. We manage the documents by getting 338667 words with its weights corresponding to each ontology. The algorithm had proved its efficiency in optimizing classifiers (SVM, NB) performance, which we tested in this study, comparing with former classifiers results for Arabic language.
In our research we offer detailed study of one of the data mining functions within the text data using the object properties in databases. It studies the possibility of applying this function on the Arabic texts. We use procedural query language P L / SQL that deals with the object of Oracle databases. Data mining model Has been built. It works on classification of Arabic texts documents using SVM algorithm for indexing of texts and texts preparation, Naïve Bayes algorithm to classify data after transformation it into nested tables. So we made an evaluation of the obtained results and conclusions.
Text Similarity is an important task in several application fields, such as information retrieval, plagiarism detection, machine translation, topic detection, text classification, text summarization and others. Finding similarity between two texts, p aragraphs or sentences, is based on measuring, directly or indirectly, the similarity between words. There are two known types of words similarity: lexical and semantic. The first one handles the words as a stream of characters: words are similar lexically if they share the same characters in the same order. The second type aims to quantify the degree to which two words are semantically related. As an example they can be, synonyms, represent the same thing or they are used in the same context. In this article we focus our investigation on measuring the semantic similarity between Arabic sentences using several representations
We introduce the new task of domain name dispute resolution (DNDR), that predicts the outcome of a process for resolving disputes about legal entitlement to a domain name. TheICANN UDRP establishes a mandatory arbitration process for a dispute betwee n a trade-mark owner and a domain name registrant pertaining to a generic Top-Level Domain (gTLD) name (one ending in .COM, .ORG, .NET, etc). The nature of the problem leads to a very skewed data set, which stems from being able to register a domain name with extreme ease, very little expense, and no need to prove an entitlement to it. In this paper, we describe thetask and associated data set. We also present benchmarking results based on a range of mod-els, which show that simple baselines are in general difficult to beat due to the skewed data distribution, but in the specific case of the respondent having submitted a response, a fine-tuned BERT model offers considerable improvements over a majority-class model

suggested questions

comments
Fetching comments Fetching comments
Sign in to be able to follow your search criteria
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا