New community

Subscribe to the gold package and get unlimited access to Shamra Academy

QuranTree.jl: A Julia Package for Quranic Arabic Corpus

qurantree.jl: حزمة جوليا للقرآري العربية

245 0 0 0.0 ( 0 )

Download Cite

Added by Association for Computation Linguistics مقالة

Publication date 2021

fields Artificial Intelligence

and research's language is English

Created by Shamra Editor

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

QuranTree.jl is an open-source package for working with the Quranic Arabic Corpus (Dukes and Habash, 2010). It aims to provide Julia APIs as an alternative to the Java APIs of JQuranTree. QuranTree.jl currently offers functionalities for intuitive indexing of chapters, verses, words and parts of words of the Qur'an; for creating custom transliteration; for character dediacritization and normalization; and, for handling the morphological features. Lastly, it can work well with Julia's TextAnalysis.jl and Python's CAMeL Tools.

References used

https://aclanthology.org/

rate research

A Package for Learning on Tabular and Text Data with Transformers

362 - Association for Computation Linguistics 2021 مقالة

Recent progress in natural language processing has led to Transformer architectures becoming the predominant model used for natural language tasks. However, in many real- world datasets, additional modalities are included which the Transformer does n ot directly leverage. We present Multimodal- Toolkit, an open-source Python package to incorporate text and tabular (categorical and numerical) data with Transformers for downstream applications. Our toolkit integrates well with Hugging Face's existing API such as tokenization and the model hub which allows easy download of different pre-trained models.

package for learning learning on tabular حزمة للتعلم التعلم على جدول صناعة حمض الفوسفور

ONE: Toward ONE model, ONE algorithm, ONE corpus dedicated to sentiment analysis of Arabic/Arabizi and its dialects

287 - Association for Computation Linguistics 2021 مقالة

Arabic is the official language of 22 countries, spoken by more than 400 million speakers. Each one of this country use at least on dialect for daily life conversation. Then, Arabic has at least 22 dialects. Each dialect can be written in Arabic or A rabizi Scripts. The most recent researches focus on constructing a language model and a training corpus for each dialect, in each script. Following this technique means constructing 46 different resources (by including the Modern Standard Arabic, MSA) for handling only one language. In this paper, we extract ONE corpus, and we propose ONE algorithm to automatically construct ONE training corpus using ONE classification model architecture for sentiment analysis MSA and different dialects. After manually reviewing the training corpus, the obtained results outperform all the research literature results for the targeted test corpora.

كلمات الأغاني تنقل modern standard arabic corpus معيار الحديثة العربية جسم صناعة حمض الفوسفور

DiaLex: A Benchmark for Evaluating Multidialectal Arabic Word Embeddings

202 - Association for Computation Linguistics 2021 مقالة

Word embeddings are a core component of modern natural language processing systems, making the ability to thoroughly evaluate them a vital task. We describe DiaLex, a benchmark for intrinsic evaluation of dialectal Arabic word embeddings. DiaLex cove rs five important Arabic dialects: Algerian, Egyptian, Lebanese, Syrian, and Tunisian. Across these dialects, DiaLex provides a testbank for six syntactic and semantic relations, namely male to female, singular to dual, singular to plural, antonym, comparative, and genitive to past tense. DiaLex thus consists of a collection of word pairs representing each of the six relations in each of the five dialects. To demonstrate the utility of DiaLex, we use it to evaluate a set of existing and new Arabic word embeddings that we developed. Beyond evaluation of word embeddings, DiaLex supports efforts to integrate dialects into the Arabic language curriculum. It can be easily translated into Modern Standard Arabic and English, which can be useful for evaluating word translation. Our benchmark, evaluation code, and new word embedding models will be publicly available.

multidialectal arabic word evaluating multidialectal arabic arabic word embeddings كلمة عربية متعددة تقييم المتعدد العربية كلمة argeddings العربية صناعة حمض الفوسفور المزيد..

Kawarith: an Arabic Twitter Corpus for Crisis Events

387 - Association for Computation Linguistics 2021 مقالة

Social media (SM) platforms such as Twitter provide large quantities of real-time data that can be leveraged during mass emergencies. Developing tools to support crisis-affected communities requires available datasets, which often do not exist for lo w resource languages. This paper introduces Kawarith a multi-dialect Arabic Twitter corpus for crisis events, comprising more than a million Arabic tweets collected during 22 crises that occurred between 2018 and 2020 and involved several types of hazard. Exploration of this content revealed the most discussed topics and information types, and the paper presents a labelled dataset from seven emergency events that serves as a gold standard for several tasks in crisis informatics research. Using annotated data from the same event, a BERT model is fine-tuned to classify tweets into different categories in the multi- label setting. Results show that BERT-based models yield good performance on this task even with small amounts of task-specific training data.

arabic twitter corpus arabic twitter العربية تويتر كوربوس تويتر عربي صناعة حمض الفوسفور

Information Retrieval System For Arabic Language

3546 - Tishreen University 2014 مشروع تخرج

تحتل الدراسات التي تتناول حوسبة اللغة العربية أهمية كبيرة نظراً للانتشار الواسع للغة العربية , و اخترنا في هذه الدراسة العمل على معالجة اللغة العربية من خلال نظام استرجاع معلومات للمستندات باللغة العربية , الفكرة الأساسية لهذا النظام هو تحليل المستن دات والنصوص العربية و إنشاء فهارس للمصطلحات الواردة فيها , ومن ثم استخلاص أشعة أوزان تعبر عن هذه المستندات من أجل المعالجة اللاحقة للاستعلام و المقارنة مع هذه الأشعة للحصول على المستندات الموافقة لهذا الاستعلام . من خلال عملية تجريد للمصطلحات الواردة في المستندات تم الحصول على كفاءة استرجاع أفضل , و تعرضنا للعديد من خوارزميات التجريد التي وصلت إليها الدراسات السابقة . و تأتي عملية عنقدة المستندات كإضافة هامة , حيث يتمكن المستخدم من معرفة المستندات المشابهة لنتيجة البحث و التي لها صلة بـالاستعلام المدخل . في التطبيق العملي , تم العمل على نظام استرجاع معلومات مكتبي , يقوم بقراءة نصوص ذات أنواع مختلفة و عرض النتائج مع العناقيد الموافقة لها .

استرجاع معلومات Arabic nlp بحث search محرك بحث IR معالجة لغات طبيعية المزيد..

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

QuranTree.jl: A Julia Package for Quranic Arabic Corpus

qurantree.jl: حزمة جوليا للقرآري العربية

Ask ChatGPT about the research

Read More

suggested questions