Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Plagiarism Detection in Arabic Language using Rhetorical Structure Theory

كشف الانتحال في اللغة العربية باستخدام نظرية بنية الكلام البلاغية

2797 3 70 0 ( 0 )

Download Cite

Added by Damascus University ورقة بحثية

Publication date 2014

and research's language is العربية

Authors خالد عمر( باحث ) - باسل الخطيب( باحث )

Created by Shamra Editor

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

This paper presents a review of available algorithms and plagiarism detection systems، and an implementation of Plagiarism Detection System using available search engines on the web. Plagiarism detection in natural language documents is a complicated problem and it is related to the characteristics of the language itself. There are many available algorithms for plagiarism detection in natural languages .Generally these algorithms belong to two main categories ; the first one is plagiarism detection algorithms based on fingerprint and the second is plagiarism detection algorithms based on content comparison and includes string matching and tree matching algorithms . Usually available systems of plagiarism detection use specific type of detection algorithms or use a mixture of detection algorithms to achieve effective detection systems (fast and accurate). In this research, a plagiarism detection system has been developed using Bing search engine and a plagiarism detection algorithm based on Rhetorical Structure Theory.

Artificial intelligence review:

Upgrade your account to view the content

Research summary

تتناول هذه الورقة البحثية موضوع كشف الانتحال في النصوص المكتوبة باللغة العربية باستخدام نظرية بنية الكلام البلاغية (Rhetorical Structure Theory). تقدم الدراسة مراجعة شاملة للخوارزميات والنظم المتاحة لكشف الانتحال، مع التركيز على خوارزميات مقارنة بصمات الملفات وخوارزميات مقارنة محتوى الملفات. تم تطوير نظام يعتمد على محرك البحث Bing وخوارزمية تستند إلى خصائص اللغة باستخدام نظرية بنية الكلام البلاغية. تم اختبار النظام على عينة من الملفات العلمية المكتوبة باللغة العربية، وأظهرت النتائج فعالية النظام في كشف الانتحال بنسبة دقة تصل إلى 75%. تتضمن الورقة شرحًا مفصلًا لنظرية بنية الكلام البلاغية وتطبيقاتها في معالجة النصوص، بالإضافة إلى تصميم النظام والخوارزمية المستخدمة في الكشف عن الانتحال. كما تقدم الورقة مقارنة بين الخوارزميات المختلفة المستخدمة في كشف الانتحال وتوضح مزايا وعيوب كل منها.

Critical review

دراسة نقدية: على الرغم من أن هذه الورقة تقدم إسهامًا مهمًا في مجال كشف الانتحال في النصوص العربية باستخدام نظرية بنية الكلام البلاغية، إلا أن هناك بعض النقاط التي يمكن تحسينها. أولاً، كان من الأفضل توسيع نطاق الاختبارات لتشمل نصوصًا من مجالات مختلفة وليس فقط البحوث العلمية، وذلك لضمان شمولية وفعالية النظام في مختلف السياقات. ثانيًا، لم يتم مقارنة النظام المطور بشكل مباشر مع نظم كشف الانتحال الأخرى المتاحة على الشبكة العنكبوتية، مما يجعل من الصعب تقييم مدى تفوق النظام الجديد. ثالثًا، يمكن تحسين النظام بإدخال البعد الدلالي في خوارزمية المقارنة بين الموصلات، وذلك باستخدام قاموس مفاهيمي لتحسين دقة الكشف عن الانتحال. وأخيرًا، كان من المفيد تقديم تحليل أكثر تفصيلاً للنتائج وتوضيح الأسباب وراء عدم كشف بعض حالات الانتحال.

Questions related to the research

ما هي الخوارزميات الرئيسية المستخدمة في كشف الانتحال وفقًا لهذه الورقة؟

الخوارزميات الرئيسية هي خوارزميات بصمة الملف (Fingerprinting) وخوارزميات مقارنة محتوى الملفات (Content Comparisons).
ما هي نسبة الدقة التي حققها النظام المطور في كشف الانتحال؟

حقق النظام المطور نسبة دقة تصل إلى 75% في كشف الانتحال.
ما هي النظرية المستخدمة في تطوير خوارزمية كشف الانتحال في هذه الورقة؟

تم استخدام نظرية بنية الكلام البلاغية (Rhetorical Structure Theory) في تطوير خوارزمية كشف الانتحال.
ما هي التحسينات المستقبلية المقترحة للنظام المطور في هذه الورقة؟

من التحسينات المستقبلية المقترحة إدخال البعد الدلالي في خوارزمية المقارنة بين الموصلات باستخدام قاموس مفاهيمي لتحسين دقة الكشف عن الانتحال.

Keywords

كشف الانتحال اللغة العربية نظرية بنية الكلام البلاغية معالجة اللغات الطبيعية الخوارزميات محرك البحث Bing

References used

Shizhong Wu; Yongle Hao; Xinyu Gao; Baojiang Cui; Ce Bian, Homology Detection Based on Abstract Syntax Tree Combined Simple Semantics Analysis, Web Intelligence and Intelligent Agent Technology (WI-IAT), vol.3, pp.410-414, 2010

Vinod K.R., Sandhya.S, Sathish Kumar D, Harani A, David Banji, Otilia JF Banji, Plagiarism-history detection and prevention, Journal for drugs and medicines, Vol.3, Issue:1, pp.1- 4, 2011

Al-Khatib B., Aspel A. ,Saleh M., fares M.، Hamad M.M., plagiarism detection using the web, Damascus university,informatics engineering college, 2007

Al-Sanie W., Towards an infrastructure for Arabic text Summarization using Rhetorical Structure Theory, master thesis , king Saud University, K.S.A., 2005

[Bing , API Basics. [online] Available at: http://www.bing.com/developers/s/APIBasics.ht ml [Accessed 15-October 2011

rate research

Plagiarism Detection in Medical Research Using Medical Ontology

2679 - Tishreen University 2016 ورقة بحثية

This paper presents a reference study of available algorithms for plagiarism detection and it develops semantic plagiarism detection algorithm for plagiarism detection in medical research papers by employing the Medical Ontologies available on the World Wide Web. The issue of plagiarism detection in medical research written in natural languages is a complex issue and related exact domain of medical research. There are many used algorithms for plagiarism detection in natural language, which are generally divided into two main categories, the first one is comparison algorithms between files by using fingerprints of files, and files content comparison algorithms, which include strings matching algorithms and text and tree matching algorithms. Recently a lot of research in the field of semantic plagiarism detection algorithms and semantic plagiarism detection algorithms were developed basing of citation analysis models in scientific research. In this research a system for plagiarism detection was developed using “Bing” search engine, where tow type of ontologies used in this system, public ontology as wordNet and many standard international ontologies in medical domain as Diseases ontology which contains a descriptions about diseases and definitions of it and the derivation between diseases.

معالجة اللغات الطبيعية Natural language processing semantic web الوب الدلالي كشف الانتحال الأنطولوجيات الطبية plagiarism detection medical ontologies المزيد..

English-Arabic Cross-language Plagiarism Detection

857 - Association for Computation Linguistics 2021 مقالة

The advancement of the web and information technology has contributed to the rapid growth of digital libraries and automatic machine translation tools which easily translate texts from one language into another. These have increased the content acces sible in different languages, which results in easily performing translated plagiarism, which are referred to as cross-language plagiarism''. Recognition of plagiarism among texts in different languages is more challenging than identifying plagiarism within a corpus written in the same language. This paper proposes a new technique for enhancing English-Arabic cross-language plagiarism detection at the sentence level. This technique is based on semantic and syntactic feature extraction using word order, word embedding and word alignment with multilingual encoders. Those features, and their combination with different machine learning (ML) algorithms, are then used in order to aid the task of classifying sentences as either plagiarized or non-plagiarized. The proposed approach has been deployed and assessed using datasets presented at SemEval-2017. Analysis of experimental data demonstrates that utilizing extracted features and their combinations with various ML classifiers achieves promising results.

cross-language plagiarism detection english-arabic cross-language plagiarism cross-language plagiarism الكشف عن الانتحال باللغة عبر اللغة الانتحال الإنجليزية والعربية الانتحال عبر اللغة صناعة حمض الفوسفور المزيد..

Survey Of Traditional And Semantic Plagiarism Detection Algorithms

2390 - Tishreen University 2016 ورقة بحثية

In this paper we review and list, the advantages and limitations of the significant effective techniques employed or developed in text plagiarism detection. It was found that many of the proposed methods for plagiarism detection have a weakness poi nts and do not detect some types of plagiarized operations. This paper show a survey about plagiarism detection including several important subjects in plagiarism detection, which is plagiarism definition, plagiarism prevention and detection, plagiarism detection systems, plagiarism detection processes and some of the current plagiarism detection techniques. This paper compares between different plagiarism detection algorithms, and shows the points of weakness, and points of efficiency, and describe the power of semantic plagiarism detection methods, and shows its efficiency in detect plagiarism cases that another plagiarism detection algorithms don’t able to detect these cases, that semantic plagiarism detection methods are developed to get rid of traditional weakness points for all plagiarism detection methods have.

خوارزميات كشف الانتحال الدلالية عملية كشف الانتحال تقنيات كشف الانتحال Semantic Plagiarism Detection algorithms Detection Process Detection Techniques

Automatic detection of plagiarism in Arabic documents based on lexical chains

1356 - جامعة صفاقس 2011 ورقة بحثية

This paper deals with automatic detection of plagiarism in Arabic documents. We present in this paper a new idea based on the experimentation of lexical chains. The proposed method extracts those chains from original document and uses a search engine to verify if such chains occur in other documents. The second step in our methods uses automatic translation system to translate lexical chains and verify by using search engine if those chain occurs in document in other languages. Then we compute a correlation ratio between lexical chains and lexical chains extracted from documents provided by the search engine to detect plagiarism in the original document. We present in the end of this paper our prototype called « Alkachef » developed to detect plagiarism in Arabic document .

معالجة اللغات الطبيعية كشف الانتحال الانتحال العلمي الكشف الآلي للإنتحال السلاسل اللغوية

Spartans@LT-EDI-EACL2021: Inclusive Speech Detection using Pretrained Language Models

610 - Association for Computation Linguistics 2021 مقالة

We describe our system that ranked first in Hope Speech Detection (HSD) shared task and fourth in Offensive Language Identification (OLI) shared task, both in Tamil language. The goal of HSD and OLI is to identify if a code-mixed comment or post cont ains hope speech or offensive content respectively. We pre-train a transformer-based model RoBERTa using synthetically generated code-mixed data and use it in an ensemble along with their pre-trained ULMFiT model available from iNLTK.

inclusive speech detection كشف الكلام الشامل صناعة حمض الفوسفور

comments

Fetching comments

Al-Andalus University for Medical Sciences

Additional details More universities

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Plagiarism Detection in Arabic Language using Rhetorical Structure Theory

كشف الانتحال في اللغة العربية باستخدام نظرية بنية الكلام البلاغية

Ask ChatGPT about the research

Read More