Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Arabic Compact Language Modelling for Resource Limited Devices

النمذجة باللغة العربية المدمجة لأجهزة الموارد المحدودة

656 0 0 0.0 ( 0 )

Download Cite

Added by Association for Computation Linguistics مقالة

Publication date 2021

fields Artificial Intelligence

and research's language is English

Created by Shamra Editor

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Natural language modelling has gained a lot of interest recently. The current state-of-the-art results are achieved by first training a very large language model and then fine-tuning it on multiple tasks. However, there is little work on smaller more compact language models for resource-limited devices or applications. Not to mention, how to efficiently train such models for a low-resource language like Arabic. In this paper, we investigate how such models can be trained in a compact way for Arabic. We also show how distillation and quantization can be applied to create even smaller models. Our experiments show that our largest model which is 2x smaller than the baseline can achieve better results on multiple tasks with 2x less data for pretraining.

References used

https://aclanthology.org/

rate research

Leveraging Offensive Language for Sarcasm and Sentiment Detection in Arabic

791 - Association for Computation Linguistics 2021 مقالة

Sarcasm detection is one of the top challenging tasks in text classification, particularly for informal Arabic with high syntactic and semantic ambiguity. We propose two systems that harness knowledge from multiple tasks to improve the performance of the classifier. This paper presents the systems used in our participation to the two sub-tasks of the Sixth Arabic Natural Language Processing Workshop (WANLP); Sarcasm Detection and Sentiment Analysis. Our methodology is driven by the hypothesis that tweets with negative sentiment and tweets with sarcasm content are more likely to have offensive content, thus, fine-tuning the classification model using large corpus of offensive language, supports the learning process of the model to effectively detect sentiment and sarcasm contents. Results demonstrate the effectiveness of our approach for sarcasm detection task over sentiment analysis task.

leveraging offensive language sixth arabic natural الاستفادة من اللغة الهجومية السادسة العربية الطبيعية صناعة حمض الفوسفور

Deep Multi-Task Model for Sarcasm Detection and Sentiment Analysis in Arabic Language

1103 - Association for Computation Linguistics 2021 مقالة

The prominence of figurative language devices, such as sarcasm and irony, poses serious challenges for Arabic Sentiment Analysis (SA). While previous research works tackle SA and sarcasm detection separately, this paper introduces an end-to-end deep Multi-Task Learning (MTL) model, allowing knowledge interaction between the two tasks. Our MTL model's architecture consists of a Bidirectional Encoder Representation from Transformers (BERT) model, a multi-task attention interaction module, and two task classifiers. The overall obtained results show that our proposed model outperforms its single-task and MTL counterparts on both sarcasm and sentiment detection subtasks.

arabic sentiment analysis arabic sentiment تحليل المشاعر العربية الجنس العربي صناعة حمض الفوسفور

Language Resource Efficient Learning for Captioning

561 - Association for Computation Linguistics 2021 مقالة

Due to complex cognitive and inferential efforts involved in the manual generation of one caption per image/video input, the human annotation resources are very limited for captioning tasks. We define language resource efficient as reaching the same performance with fewer annotated captions per input. We first study the performance degradation of caption models in different language resource settings. Our analysis of caption models with SC loss shows that the performance degradation is caused by the increasingly noisy estimation of reward and baseline with fewer language resources. To mitigate this issue, we propose to reduce the variance of noise in the baseline by generalizing the single pairwise comparison in SC loss and using multiple generalized pairwise comparisons. The generalized pairwise comparison (GPC) measures the difference between the evaluation scores of two captions with respect to an input. Empirically, we show that the model trained with the proposed GPC loss is efficient on language resource and achieves similar performance with the state-of-the-art models on MSCOCO by using only half of the language resources. Furthermore, our model significantly outperforms the state-of-the-art models on a video caption dataset that has only one labeled caption per input in the training set.

resource efficient learning learning for captioning language resource efficient التعلم كفاءة الموارد تعلم التسمية التوضيحية موارد اللغة كفاءة صناعة حمض الفوسفور المزيد..

Using Open Sources for Developing Arabic Ontology

4150 - Tishreen University 2013 ورقة بحثية

The ability to search the Web sites has become essential for many people. However many sites have problems in giving the user the needed information. Search operations are typically limited to keyword searches and do not take into consideration the u nderlying semantics of the content.The present technologies support most languages; Though Arabic is still not well supported. One of the main application areas of Ontology technology is semantics. Although there are many tools for developing Ontology’s in many languages, Arabic WordNet seems to be the only one that supports Arabic language. In this paper we will define the necessary steps to develop Arabic Ontology for university sites using Arabic WordNet, and check that the developed Ontology is clean.

Ontology Open Source semantic Arabic WordNet الأنطولجيا المصادر المفتوحة علم دلالات الألفاظ وورد نت عربي المزيد..

Cross-lingual Fine-tuning for Abstractive Arabic Text Summarization

732 - Association for Computation Linguistics 2021 مقالة

While abstractive summarization in certain languages, like English, has already reached fairly good results due to the availability of trend-setting resources, like the CNN/Daily Mail dataset, and considerable progress in generative neural models, pr ogress in abstractive summarization for Arabic, the fifth most-spoken language globally, is still in baby shoes. While some resources for extractive summarization have been available for some time, in this paper, we present the first corpus of human-written abstractive news summaries in Arabic, hoping to lay the foundation of this line of research for this important language. The dataset consists of more than 21 thousand items. We used this dataset to train a set of neural abstractive summarization systems for Arabic by fine-tuning pre-trained language models such as multilingual BERT, AraBERT, and multilingual BART-50. As the Arabic dataset is much smaller than e.g. the CNN/Daily Mail dataset, we also applied cross-lingual knowledge transfer to significantly improve the performance of our baseline systems. The setups included two M-BERT-based summarization models originally trained for Hungarian/English and a similar system based on M-BART-50 originally trained for Russian that were further fine-tuned for Arabic. Evaluation of the models was performed in terms of ROUGE, and a manual evaluation of fluency and adequacy of the models was also performed.

abstractive arabic text arabic text summarization arabic text النص العربي الجماعي تلخيص النص العربي النص العربي صناعة حمض الفوسفور المزيد..

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Arabic Compact Language Modelling for Resource Limited Devices

النمذجة باللغة العربية المدمجة لأجهزة الموارد المحدودة

Ask ChatGPT about the research

Read More

suggested questions