New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Study about Arabic Text Documents Classification using Ontologies

دراسة حول تصنيف النصوص العربية باستخدام الأنطولوجيات

2762 0 65 0 ( 0 )

Download Cite

Added by Aِl-Baath University ورقة بحثية

Publication date 2014

and research's language is العربية

Authors ريما القمحة( باحث ) - حسام الحمصي( باحث )

Created by Shamra Editor

Ontology اللغة العربية Arabic Language semantic web الويب الدلالي Documents classification Text categorization Text mining SVM NB الأنطولوجيا تصنيف المستندات تصنيف النصوص تنقيب النصوص

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

In this paper, we introduce an algorithm for grouping Arabic documents for building an ontology and its words. We execute the algorithm on five ontologies using Java. We manage the documents by getting 338667 words with its weights corresponding to each ontology. The algorithm had proved its efficiency in optimizing classifiers (SVM, NB) performance, which we tested in this study, comparing with former classifiers results for Arabic language.

Artificial intelligence review:

Upgrade your account to view the content

Research summary

تقدم هذه الدراسة خوارزمية جديدة لتصنيف النصوص العربية باستخدام الأنطولوجيا. تم تنفيذ الخوارزمية على خمس أنطولوجيات باستخدام لغة البرمجة جافا، وتم معالجة النصوص للحصول على 338667 مغردة مع أوزانها لكل أنطولوجيا. أثبتت الخوارزمية فعاليتها في تحسين أداء المصنفات مثل NB وSVM مقارنة مع نتائج المصنفات السابقة للغة العربية. تم تقسيم النصوص إلى فئات مثل الأخبار، الاقتصاد، الرياضة، العلم والتكنولوجيا، والأماكن والمواقع. تم استخدام محرك بحث Google لجمع النصوص، وتمت معالجة النصوص باستخدام أدوات مثل RapidMiner للحصول على المغردات وأوزانها. تم تدريب واختبار المصنفات باستخدام خوارزميات NB وSVM، حيث أظهرت النتائج أن مصنف SVM حقق أداءً أفضل من مصنف NB. تم تقييم المصنفات باستخدام معايير مثل F-measure، precision، وrecall، حيث حقق مصنف SVM دقة 99.31% بينما حقق مصنف NB دقة 99.00%. توصلت الدراسة إلى أن الخوارزمية المقترحة فعالة في تحسين دقة تصنيف النصوص العربية باستخدام الأنطولوجيا.

Critical review

دراسة نقدية: تعتبر هذه الدراسة خطوة مهمة في مجال تصنيف النصوص العربية باستخدام الأنطولوجيا، إلا أن هناك بعض النقاط التي يمكن تحسينها. أولاً، تم جمع النصوص باستخدام محرك بحث Google، مما قد يؤدي إلى تضمين نصوص غير ذات صلة بالأنطولوجيا المستهدفة. كان من الممكن تحسين دقة النتائج من خلال مراجعة يدوية للنصوص المسترجعة. ثانياً، لم يتم التطرق إلى تأثير حجم البيانات على أداء المصنفات بشكل كافٍ، حيث تم استخدام 2008 نص فقط. قد يكون من المفيد إجراء تجارب إضافية باستخدام مجموعات بيانات أكبر. أخيراً، يمكن تحسين الخوارزمية من خلال دمج تقنيات تعلم الآلة المتقدمة مثل الشبكات العصبية العميقة لتحسين دقة التصنيف.

Questions related to the research

ما هي الخوارزمية المستخدمة في تصنيف النصوص العربية في هذه الدراسة؟

تم استخدام خوارزمية جديدة لتصنيف النصوص العربية باستخدام الأنطولوجيا، وتم تنفيذها باستخدام لغة البرمجة جافا.
ما هي المصنفات التي تم استخدامها في هذه الدراسة؟

تم استخدام مصنفين هما Naive Bayes (NB) وSupport Vector Machine (SVM).
ما هي دقة المصنفات التي تم تحقيقها في هذه الدراسة؟

حقق مصنف SVM دقة 99.31% بينما حقق مصنف NB دقة 99.00%.
ما هي الفئات التي تم تصنيف النصوص إليها في هذه الدراسة؟

تم تصنيف النصوص إلى فئات مثل الأخبار، الاقتصاد، الرياضة، العلم والتكنولوجيا، والأماكن والمواقع.

Keywords

الأنطولوجيا تصنيف المستندات تصنيف النصوص تتقيب النصوص SVM الويب الدلالي اللغة العربية

References used

AL-Ghuribi,S Alshomrani,S. 2014. Bi-languages mining algorithm for classifying text documents (BiLTc), International Jornal of Academic Research Part A Vol. 6 No. 5, 16-25

Gruber,T. 1993. A translation approach to providing portable ontology specifications, Knowledge Acquisition, Vol.5 No 2, 199-220

Hastie,T Tibshirani,R Friedman.J. 2013-The elements of Statistical Learning - Data Mining, Inference, and Prediction. Springer-Verlag, second Ed, Berlin,764p

rate research

Arabic documents classification system

3513 - Tishreen University 2012 مشروع تخرج

اخترنا في هذا المشروع العمل على تطوير نظام يقوم بتصنيف المستندات العربية حسب محتواها, يقوم هذه النظام بالتحليل اللفظي لكلمات المستند ثم إجراء عملية Stemming"رد الأفعال إلى أصلها" ثم تطبيق عملية إحصائية على المستند في مرحلة تدريب النظام ثم بالاعتماد على خوارزميات في الذكاء الصنعي يتم تصنيف المستند حسب محتواه ضمن عناقيد

Machine learning Nlp Support vector machine fuzzy system Arabic nlp

Classification Of Arabic Texts Using Object Properties In Databases

2370 - Aِl-Baath University 2016 ورقة بحثية

In our research we offer detailed study of one of the data mining functions within the text data using the object properties in databases. It studies the possibility of applying this function on the Arabic texts. We use procedural query language P L / SQL that deals with the object of Oracle databases. Data mining model Has been built. It works on classification of Arabic texts documents using SVM algorithm for indexing of texts and texts preparation, Naïve Bayes algorithm to classify data after transformation it into nested tables. So we made an evaluation of the obtained results and conclusions.

Data Mining Algorithms قواعد البيانات الغرضية الأغراض النصية التنقيب في البيانات النصية خوارزمية التصنيف البيزياني البيانات غير المهيكلة خوارزمية SVM Object Oriented Database Text Objects Data Mining Texts SVM Algorithm Naïve Bayes Algorithm Unstructured Data المزيد..

Text-to-Phonemes in Arabic

3149 - Damascus University 2003 ورقة بحثية

This research is one stage of the construction of an Arabic speech synthesis system, which is “text-to-phonemes transliteration”. A complete text-to-phonemes transliteration system has been built for Arabic language. In this system we used TOPH ( Orthographic-Phonetic Transcription) method, used for transliterating the French language, to perform the transliteration from text to phonemes in Arabic. We also wrote the Arabic textto- phonemes rules in TOPH formal language.

من نص إلى صوتي تركيب الكلام Text-to-phoneme Speech synthesis TOPH system

Automatic Difficulty Classification of Arabic Sentences

470 - Association for Computation Linguistics 2021 مقالة

In this paper, we present a Modern Standard Arabic (MSA) Sentence difficulty classifier, which predicts the difficulty of sentences for language learners using either the CEFR proficiency levels or the binary classification as simple or complex. We c ompare the use of sentence embeddings of different kinds (fastText, mBERT , XLM-R and Arabic-BERT), as well as traditional language features such as POS tags, dependency trees, readability scores and frequency lists for language learners. Our best results have been achieved using fined-tuned Arabic-BERT. The accuracy of our 3-way CEFR classification is F-1 of 0.80 and 0.75 for Arabic-Bert and XLM-R classification respectively and 0.71 Spearman correlation for regression. Our binary difficulty classifier reaches F-1 0.94 and F-1 0.98 for sentence-pair semantic similarity classifier.

اللغة العربية المدربة مسبقا automatic difficulty classification standard arabic تصنيف صعوبة التلقائي عربي قياسي صناعة حمض الفوسفور

Automatic detection of plagiarism in Arabic documents based on lexical chains

849 - جامعة صفاقس 2011 ورقة بحثية

This paper deals with automatic detection of plagiarism in Arabic documents. We present in this paper a new idea based on the experimentation of lexical chains. The proposed method extracts those chains from original document and uses a search engine to verify if such chains occur in other documents. The second step in our methods uses automatic translation system to translate lexical chains and verify by using search engine if those chain occurs in document in other languages. Then we compute a correlation ratio between lexical chains and lexical chains extracted from documents provided by the search engine to detect plagiarism in the original document. We present in the end of this paper our prototype called « Alkachef » developed to detect plagiarism in Arabic document .

معالجة اللغات الطبيعية كشف الانتحال الانتحال العلمي الكشف الآلي للإنتحال السلاسل اللغوية

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Study about Arabic Text Documents Classification using Ontologies

دراسة حول تصنيف النصوص العربية باستخدام الأنطولوجيات

Ask ChatGPT about the research

Read More

suggested questions