Do you want to publish a course? Click here

Automatic detection of plagiarism in Arabic documents based on lexical chains

كشف حالات الإنتحال في النصوص المدونة باللغة العربية بالإعتماد على السلاسل اللغوية

835   1   0   0.0 ( 0 )
 Publication date 2011
and research's language is العربية
 Created by Shamra Editor




Ask ChatGPT about the research

This paper deals with automatic detection of plagiarism in Arabic documents. We present in this paper a new idea based on the experimentation of lexical chains. The proposed method extracts those chains from original document and uses a search engine to verify if such chains occur in other documents. The second step in our methods uses automatic translation system to translate lexical chains and verify by using search engine if those chain occurs in document in other languages. Then we compute a correlation ratio between lexical chains and lexical chains extracted from documents provided by the search engine to detect plagiarism in the original document. We present in the end of this paper our prototype called « Alkachef » developed to detect plagiarism in Arabic document .



References used
Belguith L., Baccour L., Mourad G., “Segmentation de textes arabes basée sur l'analyse contextuelle des signes de ponctuations et de certaines particules”, Actes de la 12ème conférence sur le Traitement Automatique des Langues Naturelles TALN’2005, , Vol. 1, p. 451–456.Dourdan France, 6–10, Juin 2005.
Morris, J., Hirst G., “Lexical cohesion computed by thesaural relations as an indicator of the structure of text”. in Computational Linguistics 17(1): pp. 21 43, 1991
Seaward L., Matwin S., Intrinsic Plagiarism Detection using Complexity Analysis”, in PAN'09, pp. 56-61, 2009.
rate research

Read More

This paper presents a review of available algorithms and plagiarism detection systems، and an implementation of Plagiarism Detection System using available search engines on the web. Plagiarism detection in natural language documents is a complicat ed problem and it is related to the characteristics of the language itself. There are many available algorithms for plagiarism detection in natural languages .Generally these algorithms belong to two main categories ; the first one is plagiarism detection algorithms based on fingerprint and the second is plagiarism detection algorithms based on content comparison and includes string matching and tree matching algorithms . Usually available systems of plagiarism detection use specific type of detection algorithms or use a mixture of detection algorithms to achieve effective detection systems (fast and accurate). In this research, a plagiarism detection system has been developed using Bing search engine and a plagiarism detection algorithm based on Rhetorical Structure Theory.
The advancement of the web and information technology has contributed to the rapid growth of digital libraries and automatic machine translation tools which easily translate texts from one language into another. These have increased the content acces sible in different languages, which results in easily performing translated plagiarism, which are referred to as cross-language plagiarism''. Recognition of plagiarism among texts in different languages is more challenging than identifying plagiarism within a corpus written in the same language. This paper proposes a new technique for enhancing English-Arabic cross-language plagiarism detection at the sentence level. This technique is based on semantic and syntactic feature extraction using word order, word embedding and word alignment with multilingual encoders. Those features, and their combination with different machine learning (ML) algorithms, are then used in order to aid the task of classifying sentences as either plagiarized or non-plagiarized. The proposed approach has been deployed and assessed using datasets presented at SemEval-2017. Analysis of experimental data demonstrates that utilizing extracted features and their combinations with various ML classifiers achieves promising results.
In this paper, we introduce an algorithm for grouping Arabic documents for building an ontology and its words. We execute the algorithm on five ontologies using Java. We manage the documents by getting 338667 words with its weights corresponding to each ontology. The algorithm had proved its efficiency in optimizing classifiers (SVM, NB) performance, which we tested in this study, comparing with former classifiers results for Arabic language.
We describe our submitted system to the 2021 Shared Task on Sarcasm and Sentiment Detection in Arabic (Abu Farha et al., 2021). We tackled both subtasks, namely Sarcasm Detection (Subtask 1) and Sentiment Analysis (Subtask 2). We used state-of-the-ar t pretrained contextualized text representation models and fine-tuned them according to the downstream task in hand. As a first approach, we used Google's multilingual BERT and then other Arabic variants: AraBERT, ARBERT and MARBERT. The results found show that MARBERT outperforms all of the previously mentioned models overall, either on Subtask 1 or Subtask 2.
The main purpose of the present research is to support Arabic Text- to - Speech synthesizers, with natural prosody, based on linguistic analysis of texts to synthesize, and automatic prosody generation, using rules which are deduced from recorded s ignals analysis, of different types of sentences in Arabic. All the types of Arabic sentences (declarative and constructive) were enumerated with the help of an expert in Arabic linguistics . A textual corpus of about 2500 sentences covering most of these types was built and recorded both in natural prosody and without prosody. Later, these sentences were analyzed to extract prosody effect on the signal parameters, and to build prosody generation rules. In this paper, we present the results on negation sentences, applied on synthesized speech using the open source tool MBROLA. The results can be used with any parametric Arabic synthesizer. Future work will apply the rules on a new Arabic synthesizer based on semi-syllables units, which is under development in the Higher Institute for Applied Sciences and Technology.

suggested questions

comments
Fetching comments Fetching comments
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا