Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Persian SemCor: A Bag of Word Sense Annotated Corpus for the Persian Language

الفارسي SEMCOR: كيس من معنى الكلمة المشروحة Corpus اللغة الفارسية

984 0 0 0.0 ( 0 )

Download Cite

Added by Association for Computation Linguistics مقالة

Publication date 2021

fields Artificial Intelligence

and research's language is English

Created by Shamra Editor

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Supervised approaches usually achieve the best performance in the Word Sense Disambiguation problem. However, the unavailability of large sense annotated corpora for many low-resource languages make these approaches inapplicable for them in practice. In this paper, we mitigate this issue for the Persian language by proposing a fully automatic approach for obtaining Persian SemCor (PerSemCor), as a Persian Bag-of-Word (BoW) sense-annotated corpus. We evaluated PerSemCor both intrinsically and extrinsically and showed that it can be effectively used as training sets for Persian supervised WSD systems. To encourage future research on Persian Word Sense Disambiguation, we release the PerSemCor in http://nlp.sbu.ac.ir.

References used

https://aclanthology.org/

rate research

ParsTwiNER: A Corpus for Named Entity Recognition at Informal Persian

681 - Association for Computation Linguistics 2021 مقالة

As a result of unstructured sentences and some misspellings and errors, finding named entities in a noisy environment such as social media takes much more effort. ParsTwiNER contains about 250k tokens, based on standard instructions like MUC-6 or CoN LL 2003, gathered from Persian Twitter. Using Cohen's Kappa coefficient, the consistency of annotators is 0.95, a high score. In this study, we demonstrate that some state-of-the-art models degrade on these corpora, and trained a new model using parallel transfer learning based on the BERT architecture. Experimental results show that the model works well in informal Persian as well as in formal Persian.

مقياس النطاق entity recognition named entity اسمه الكيان الاعتراف اعتراف الكيان كيان اسمه صناعة حمض الفوسفور المزيد..

ParsiNLU: A Suite of Language Understanding Challenges for Persian

1051 - Association for Computation Linguistics 2021 مقالة

Abstract Despite the progress made in recent years in addressing natural language understanding (NLU) challenges, the majority of this progress remains to be concentrated on resource-rich languages like English. This work focuses on Persian language, one of the widely spoken languages in the world, and yet there are few NLU datasets available for this language. The availability of high-quality evaluation datasets is a necessity for reliable assessment of the progress on different NLU tasks and domains. We introduce ParsiNLU, the first benchmark in Persian language that includes a range of language understanding tasks---reading comprehension, textual entailment, and so on. These datasets are collected in a multitude of ways, often involving manual annotations by native speakers. This results in over 14.5k new instances across 6 distinct NLU tasks. Additionally, we present the first results on state-of-the-art monolingual and multilingual pre-trained language models on this benchmark and compare them with human performance, which provides valuable insights into our ability to tackle natural language understanding challenges in Persian. We hope ParsiNLU fosters further research and advances in Persian language understanding.1

language understanding challenges persian language لغة فهم التحديات اللغة الفارسية صناعة حمض الفوسفور

Discovery of Multiword Expressions with Loanwords and Their Equivalents in the Persian Language

521 - Association for Computation Linguistics 2021 مقالة

This paper presents an attempt at multiword expressions (MWEs) discovery in the Persian language. It focuses on extracting MWEs containing lemmas of a particular group: loanwords in Persian and their equivalents proposed by the Academy of Persian Lan guage and Literature. In order to discover such MWEs, four association measures (AMs) are used and evaluated. Finally, the list of extracted MWEs is analyzed, and a comparison between expressions with loanwords and equivalents is presented. To our knowledge, this is the first time such analysis was provided for the Persian language.

سورانيا multiword expressions persian تعبيرات متعددة الكلمات صناعة حمض الفوسفور

EmoPars: A Collection of 30K Emotion-Annotated Persian Social Media Texts

1084 - Association for Computation Linguistics 2021 مقالة

The wide reach of social media platforms, such as Twitter, have enabled many users to share their thoughts, opinions and emotions on various topics online. The ability to detect these emotions automatically would allow social scientists, as well as, businesses to better understand responses from nations and costumers. In this study we introduce a dataset of 30,000 Persian Tweets labeled with Ekman's six basic emotions (Anger, Fear, Happiness, Sadness, Hatred, and Wonder). This is the first publicly available emotion dataset in the Persian language. In this paper, we explain the data collection and labeling scheme used for the creation of this dataset. We also analyze the created dataset, showing the different features and characteristics of the data. Among other things, we investigate co-occurrence of different emotions in the dataset, and the relationship between sentiment and emotion of textual instances. The dataset is publicly available at https://github.com/nazaninsbr/Persian-Emotion-Detection.

social media texts persian social media media texts نصوص وسائل التواصل الاجتماعي وسائل التواصل الاجتماعي الفارسي نصوص وسائل الإعلام صناعة حمض الفوسفور المزيد..

Leveraging Word-Formation Knowledge for Chinese Word Sense Disambiguation

926 - Association for Computation Linguistics 2021 مقالة

In parataxis languages like Chinese, word meanings are constructed using specific word-formations, which can help to disambiguate word senses. However, such knowledge is rarely explored in previous word sense disambiguation (WSD) methods. In this pap er, we propose to leverage word-formation knowledge to enhance Chinese WSD. We first construct a large-scale Chinese lexical sample WSD dataset with word-formations. Then, we propose a model FormBERT to explicitly incorporate word-formations into sense disambiguation. To further enhance generalizability, we design a word-formation predictor module in case word-formation annotations are unavailable. Experimental results show that our method brings substantial performance improvement over strong baselines.

ذاكرة chinese word sense كلمة الصينية المعنى صناعة حمض الفوسفور

Persian SemCor: A Bag of Word Sense Annotated Corpus for the Persian Language

الفارسي SEMCOR: كيس من معنى الكلمة المشروحة Corpus اللغة الفارسية

Ask ChatGPT about the research

Read More

suggested questions