Do you want to publish a course? Click here

Plagiarism Detection in Arabic Language using Rhetorical Structure Theory

كشف الانتحال في اللغة العربية باستخدام نظرية بنية الكلام البلاغية

2154   3   70   0 ( 0 )
 Publication date 2014
and research's language is العربية
 Created by Shamra Editor




Ask ChatGPT about the research

This paper presents a review of available algorithms and plagiarism detection systems، and an implementation of Plagiarism Detection System using available search engines on the web. Plagiarism detection in natural language documents is a complicated problem and it is related to the characteristics of the language itself. There are many available algorithms for plagiarism detection in natural languages .Generally these algorithms belong to two main categories ; the first one is plagiarism detection algorithms based on fingerprint and the second is plagiarism detection algorithms based on content comparison and includes string matching and tree matching algorithms . Usually available systems of plagiarism detection use specific type of detection algorithms or use a mixture of detection algorithms to achieve effective detection systems (fast and accurate). In this research, a plagiarism detection system has been developed using Bing search engine and a plagiarism detection algorithm based on Rhetorical Structure Theory.


Artificial intelligence review:
Research summary
تتناول هذه الورقة البحثية موضوع كشف الانتحال في النصوص المكتوبة باللغة العربية باستخدام نظرية بنية الكلام البلاغية (Rhetorical Structure Theory). تقدم الدراسة مراجعة شاملة للخوارزميات والنظم المتاحة لكشف الانتحال، مع التركيز على خوارزميات مقارنة بصمات الملفات وخوارزميات مقارنة محتوى الملفات. تم تطوير نظام يعتمد على محرك البحث Bing وخوارزمية تستند إلى خصائص اللغة باستخدام نظرية بنية الكلام البلاغية. تم اختبار النظام على عينة من الملفات العلمية المكتوبة باللغة العربية، وأظهرت النتائج فعالية النظام في كشف الانتحال بنسبة دقة تصل إلى 75%. تتضمن الورقة شرحًا مفصلًا لنظرية بنية الكلام البلاغية وتطبيقاتها في معالجة النصوص، بالإضافة إلى تصميم النظام والخوارزمية المستخدمة في الكشف عن الانتحال. كما تقدم الورقة مقارنة بين الخوارزميات المختلفة المستخدمة في كشف الانتحال وتوضح مزايا وعيوب كل منها.
Critical review
دراسة نقدية: على الرغم من أن هذه الورقة تقدم إسهامًا مهمًا في مجال كشف الانتحال في النصوص العربية باستخدام نظرية بنية الكلام البلاغية، إلا أن هناك بعض النقاط التي يمكن تحسينها. أولاً، كان من الأفضل توسيع نطاق الاختبارات لتشمل نصوصًا من مجالات مختلفة وليس فقط البحوث العلمية، وذلك لضمان شمولية وفعالية النظام في مختلف السياقات. ثانيًا، لم يتم مقارنة النظام المطور بشكل مباشر مع نظم كشف الانتحال الأخرى المتاحة على الشبكة العنكبوتية، مما يجعل من الصعب تقييم مدى تفوق النظام الجديد. ثالثًا، يمكن تحسين النظام بإدخال البعد الدلالي في خوارزمية المقارنة بين الموصلات، وذلك باستخدام قاموس مفاهيمي لتحسين دقة الكشف عن الانتحال. وأخيرًا، كان من المفيد تقديم تحليل أكثر تفصيلاً للنتائج وتوضيح الأسباب وراء عدم كشف بعض حالات الانتحال.
Questions related to the research
  1. ما هي الخوارزميات الرئيسية المستخدمة في كشف الانتحال وفقًا لهذه الورقة؟

    الخوارزميات الرئيسية هي خوارزميات بصمة الملف (Fingerprinting) وخوارزميات مقارنة محتوى الملفات (Content Comparisons).

  2. ما هي نسبة الدقة التي حققها النظام المطور في كشف الانتحال؟

    حقق النظام المطور نسبة دقة تصل إلى 75% في كشف الانتحال.

  3. ما هي النظرية المستخدمة في تطوير خوارزمية كشف الانتحال في هذه الورقة؟

    تم استخدام نظرية بنية الكلام البلاغية (Rhetorical Structure Theory) في تطوير خوارزمية كشف الانتحال.

  4. ما هي التحسينات المستقبلية المقترحة للنظام المطور في هذه الورقة؟

    من التحسينات المستقبلية المقترحة إدخال البعد الدلالي في خوارزمية المقارنة بين الموصلات باستخدام قاموس مفاهيمي لتحسين دقة الكشف عن الانتحال.


References used
Shizhong Wu; Yongle Hao; Xinyu Gao; Baojiang Cui; Ce Bian, Homology Detection Based on Abstract Syntax Tree Combined Simple Semantics Analysis, Web Intelligence and Intelligent Agent Technology (WI-IAT), vol.3, pp.410-414, 2010
Vinod K.R., Sandhya.S, Sathish Kumar D, Harani A, David Banji, Otilia JF Banji, Plagiarism-history detection and prevention, Journal for drugs and medicines, Vol.3, Issue:1, pp.1- 4, 2011
Al-Khatib B., Aspel A. ,Saleh M., fares M.، Hamad M.M., plagiarism detection using the web, Damascus university,informatics engineering college, 2007
Al-Sanie W., Towards an infrastructure for Arabic text Summarization using Rhetorical Structure Theory, master thesis , king Saud University, K.S.A., 2005
[Bing , API Basics. [online] Available at: http://www.bing.com/developers/s/APIBasics.ht ml [Accessed 15-October 2011
rate research

Read More

This paper presents a reference study of available algorithms for plagiarism detection and it develops semantic plagiarism detection algorithm for plagiarism detection in medical research papers by employing the Medical Ontologies available on the World Wide Web. The issue of plagiarism detection in medical research written in natural languages is a complex issue and related exact domain of medical research. There are many used algorithms for plagiarism detection in natural language, which are generally divided into two main categories, the first one is comparison algorithms between files by using fingerprints of files, and files content comparison algorithms, which include strings matching algorithms and text and tree matching algorithms. Recently a lot of research in the field of semantic plagiarism detection algorithms and semantic plagiarism detection algorithms were developed basing of citation analysis models in scientific research. In this research a system for plagiarism detection was developed using “Bing” search engine, where tow type of ontologies used in this system, public ontology as wordNet and many standard international ontologies in medical domain as Diseases ontology which contains a descriptions about diseases and definitions of it and the derivation between diseases.
The advancement of the web and information technology has contributed to the rapid growth of digital libraries and automatic machine translation tools which easily translate texts from one language into another. These have increased the content acces sible in different languages, which results in easily performing translated plagiarism, which are referred to as cross-language plagiarism''. Recognition of plagiarism among texts in different languages is more challenging than identifying plagiarism within a corpus written in the same language. This paper proposes a new technique for enhancing English-Arabic cross-language plagiarism detection at the sentence level. This technique is based on semantic and syntactic feature extraction using word order, word embedding and word alignment with multilingual encoders. Those features, and their combination with different machine learning (ML) algorithms, are then used in order to aid the task of classifying sentences as either plagiarized or non-plagiarized. The proposed approach has been deployed and assessed using datasets presented at SemEval-2017. Analysis of experimental data demonstrates that utilizing extracted features and their combinations with various ML classifiers achieves promising results.
In this paper we review and list, the advantages and limitations of the significant effective techniques employed or developed in text plagiarism detection. It was found that many of the proposed methods for plagiarism detection have a weakness poi nts and do not detect some types of plagiarized operations. This paper show a survey about plagiarism detection including several important subjects in plagiarism detection, which is plagiarism definition, plagiarism prevention and detection, plagiarism detection systems, plagiarism detection processes and some of the current plagiarism detection techniques. This paper compares between different plagiarism detection algorithms, and shows the points of weakness, and points of efficiency, and describe the power of semantic plagiarism detection methods, and shows its efficiency in detect plagiarism cases that another plagiarism detection algorithms don’t able to detect these cases, that semantic plagiarism detection methods are developed to get rid of traditional weakness points for all plagiarism detection methods have.
This paper deals with automatic detection of plagiarism in Arabic documents. We present in this paper a new idea based on the experimentation of lexical chains. The proposed method extracts those chains from original document and uses a search engine to verify if such chains occur in other documents. The second step in our methods uses automatic translation system to translate lexical chains and verify by using search engine if those chain occurs in document in other languages. Then we compute a correlation ratio between lexical chains and lexical chains extracted from documents provided by the search engine to detect plagiarism in the original document. We present in the end of this paper our prototype called « Alkachef » developed to detect plagiarism in Arabic document .
We describe our system that ranked first in Hope Speech Detection (HSD) shared task and fourth in Offensive Language Identification (OLI) shared task, both in Tamil language. The goal of HSD and OLI is to identify if a code-mixed comment or post cont ains hope speech or offensive content respectively. We pre-train a transformer-based model RoBERTa using synthetically generated code-mixed data and use it in an ensemble along with their pre-trained ULMFiT model available from iNLTK.
comments
Fetching comments Fetching comments
Sign in to be able to follow your search criteria
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا