Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

CoDeRooMor: A new dataset for non-inflectional morphology studies of Swedish

CoderoMor: مجموعة بيانات جديدة لدراسات التشكل غير الانتهاء من السويدية

722 0 0 0.0 ( 0 )

Download Cite

Added by Association for Computation Linguistics مقالة

Publication date 2021

fields Artificial Intelligence

and research's language is English

Created by Shamra Editor

non-inflectional morphology studies swedish word formation modern swedish word دراسات التشكل غير الانهيار تكوين كلمة سويدية الكلمة السويدية الحديثة صناعة حمض الفوسفور

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

The paper introduces a new resource, CoDeRooMor, for studying the morphology of modern Swedish word formation. The approximately 16.000 lexical items in the resource have been manually segmented into word-formation morphemes, and labeled for their categories, such as prefixes, suffixes, roots, etc. Word-formation mechanisms, such as derivation and compounding have been associated with each item on the list. The article describes the selection of items for manual annotation and the principles of annotation, reports on the reliability of the manual annotation, and presents tools, resources and some first statistics. Given the''gold'' nature of the resource, it is possible to use it for empirical studies as well as to develop linguistically-aware algorithms for morpheme segmentation and labeling (cf statistical subword approach). The resource will be made freely available.

References used

https://aclanthology.org/

rate research

Negative language transfer in learner English: A new dataset

765 - Association for Computation Linguistics 2021 مقالة

Automatic personalized corrective feedback can help language learners from different backgrounds better acquire a new language. This paper introduces a learner English dataset in which learner errors are accompanied by information about possible erro r sources. This dataset contains manually annotated error causes for learner writing errors. These causes tie learner mistakes to structures from their first languages, when the rules in English and in the first language diverge. This new dataset will enable second language acquisition researchers to computationally analyze a large quantity of learner errors that are related to language transfer from the learners' first language. The dataset can also be applied in personalizing grammatical error correction systems according to the learners' first language and in providing feedback that is informed by the cause of an error.

معصوب العينين learner english المتعلم اللغة الإنجليزية صناعة حمض الفوسفور

WRIME: A New Dataset for Emotional Intensity Estimation with Subjective and Objective Annotations

539 - Association for Computation Linguistics 2021 مقالة

We annotate 17,000 SNS posts with both the writer's subjective emotional intensity and the reader's objective one to construct a Japanese emotion analysis dataset. In this study, we explore the difference between the emotional intensity of the writer and that of the readers with this dataset. We found that the reader cannot fully detect the emotions of the writer, especially anger and trust. In addition, experimental results in estimating the emotional intensity show that it is more difficult to estimate the writer's subjective labels than the readers'. The large gap between the subjective and objective emotions imply the complexity of the mapping from a post to the subjective emotion intensities, which also leads to a lower performance with machine learning models.

emotional intensity estimation emotional intensity subjective emotional intensity تقدير الشدة العاطفية الكثافة العاطفية شدة عاطفية ذاتية صناعة حمض الفوسفور المزيد..

A New Dataset and Efficient Baselines for Document-level Text Simplification in German

783 - Association for Computation Linguistics 2021 مقالة

The task of document-level text simplification is very similar to summarization with the additional difficulty of reducing complexity. We introduce a newly collected data set of German texts, collected from the Swiss news magazine 20 Minuten (20 Minu tes') that consists of full articles paired with simplified summaries. Furthermore, we present experiments on automatic text simplification with the pretrained multilingual mBART and a modified version thereof that is more memory-friendly, using both our new data set and existing simplification corpora. Our modifications of mBART let us train at a lower memory cost without much loss in performance, in fact, the smaller mBART even improves over the standard model in a setting with multiple simplification levels.

dataset and efficient efficient baselines document-level text simplification DataSet وفعال خطوط أساس فعالة تبسيط نص المستند صناعة حمض الفوسفور المزيد..

The Swedish Winogender Dataset

717 - Association for Computation Linguistics 2021 مقالة

We introduce the SweWinogender test set, a diagnostic dataset to measure gender bias in coreference resolution. It is modelled after the English Winogender benchmark, and is released with reference statistics on the distribution of men and women betw een occupations and the association between gender and occupation in modern corpus material. The paper discusses the design and creation of the dataset, and presents a small investigation of the supplementary statistics.

swedish winogender dataset swedish winogender english winogender benchmark سويدية وينوجندر DataSet. السويدية ينوجندر الإنجليزية ينوجندر المعايير صناعة حمض الفوسفور المزيد..

RED: A Novel Dataset for Romanian Emotion Detection from Tweets

1245 - Association for Computation Linguistics 2021 مقالة

In Romanian language there are some resources for automatic text comprehension, but for Emotion Detection, not lexicon-based, there are none. To cover this gap, we extracted data from Twitter and created the first dataset containing tweets annotated with five types of emotions: joy, fear, sadness, anger and neutral, with the intent of being used for opinion mining and analysis tasks. In this article we present some features of our novel dataset, and create a benchmark to achieve the first supervised machine learning model for automatic Emotion Detection in Romanian short texts. We investigate the performance of four classical machine learning models: Multinomial Naive Bayes, Logistic Regression, Support Vector Classification and Linear Support Vector Classification. We also investigate more modern approaches like fastText, which makes use of subword information. Lastly, we fine-tune the Romanian BERT for text classification and our experiments show that the BERT-based model has the best performance for the task of Emotion Detection from Romanian tweets. Keywords: Emotion Detection, Twitter, Romanian, Supervised Machine Learning

التبعيات العالمية romanian emotion detection support vector classification الكشف عن العاطفة الرومانية دعم تصنيف ناقلات صناعة حمض الفوسفور

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

CoDeRooMor: A new dataset for non-inflectional morphology studies of Swedish

CoderoMor: مجموعة بيانات جديدة لدراسات التشكل غير الانتهاء من السويدية

Ask ChatGPT about the research

Read More

suggested questions