New community

Subscribe to the gold package and get unlimited access to Shamra Academy

The Swedish Winogender Dataset

DataSet السويدية ينوجندر

409 0 0 0.0 ( 0 )

Download Cite

Added by Association for Computation Linguistics مقالة

Publication date 2021

fields Artificial Intelligence

and research's language is English

Created by Shamra Editor

swedish winogender dataset swedish winogender english winogender benchmark سويدية وينوجندر DataSet. السويدية ينوجندر الإنجليزية ينوجندر المعايير صناعة حمض الفوسفور

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

We introduce the SweWinogender test set, a diagnostic dataset to measure gender bias in coreference resolution. It is modelled after the English Winogender benchmark, and is released with reference statistics on the distribution of men and women between occupations and the association between gender and occupation in modern corpus material. The paper discusses the design and creation of the dataset, and presents a small investigation of the supplementary statistics.

References used

https://aclanthology.org/

rate research

CoDeRooMor: A new dataset for non-inflectional morphology studies of Swedish

417 - Association for Computation Linguistics 2021 مقالة

The paper introduces a new resource, CoDeRooMor, for studying the morphology of modern Swedish word formation. The approximately 16.000 lexical items in the resource have been manually segmented into word-formation morphemes, and labeled for their ca tegories, such as prefixes, suffixes, roots, etc. Word-formation mechanisms, such as derivation and compounding have been associated with each item on the list. The article describes the selection of items for manual annotation and the principles of annotation, reports on the reliability of the manual annotation, and presents tools, resources and some first statistics. Given the''gold'' nature of the resource, it is possible to use it for empirical studies as well as to develop linguistically-aware algorithms for morpheme segmentation and labeling (cf statistical subword approach). The resource will be made freely available.

non-inflectional morphology studies swedish word formation modern swedish word دراسات التشكل غير الانهيار تكوين كلمة سويدية الكلمة السويدية الحديثة صناعة حمض الفوسفور المزيد..

Enriching the E2E dataset

595 - Association for Computation Linguistics 2021 مقالة

This study introduces an enriched version of the E2E dataset, one of the most popular language resources for data-to-text NLG. We extract intermediate representations for popular pipeline tasks such as discourse ordering, text structuring, lexicaliza tion and referring expression generation, enabling researchers to rapidly develop and evaluate their data-to-text pipeline systems. The intermediate representations are extracted by aligning non-linguistic and text representations through a process called delexicalization, which consists in replacing input referring expressions to entities/attributes with placeholders. The enriched dataset is publicly available.

dataset enriching nlg DataSet. إثراء NLG. صناعة حمض الفوسفور المزيد..

Introducing CAD: the Contextual Abuse Dataset

369 - Association for Computation Linguistics 2021 مقالة

Online abuse can inflict harm on users and communities, making online spaces unsafe and toxic. Progress in automatically detecting and classifying abusive content is often held back by the lack of high quality and detailed datasets.We introduce a new dataset of primarily English Reddit entries which addresses several limitations of prior work. It (1) contains six conceptually distinct primary categories as well as secondary categories, (2) has labels annotated in the context of the conversation thread, (3) contains rationales and (4) uses an expert-driven group-adjudication process for high quality annotations. We report several baseline models to benchmark the work of future researchers. The annotated dataset, annotation guidelines, models and code are freely available.

contextual abuse dataset introducing cad contextual abuse بيانات الإساءة السياقية تقديم CAD. سوء المعاملة السياقية صناعة حمض الفوسفور المزيد..

Part-of-speech tagging of Swedish texts in the neural era

548 - Association for Computation Linguistics 2021 مقالة

We train and test five open-source taggers, which use different methods, on three Swedish corpora, which are of comparable size but use different tagsets. The KB-Bert tagger achieves the highest accuracy for part-of-speech and morphological tagging, while being fast enough for practical use. We also compare the performance across tagsets and across different genres in one of the corpora. We perform manual error analysis and perform a statistical analysis of factors which affect how difficult specific tags are. Finally, we test ensemble methods, showing that a small (but not significant) improvement over the best-performing tagger can be achieved.

neural era swedish texts swedish corpora الحقبة العصبية النصوص السويدية سوريا السويدية صناعة حمض الفوسفور المزيد..

SuperSim: a test set for word similarity and relatedness in Swedish

313 - Association for Computation Linguistics 2021 مقالة

Language models are notoriously difficult to evaluate. We release SuperSim, a large-scale similarity and relatedness test set for Swedish built with expert human judgements. The test set is composed of 1,360 word-pairs independently judged for both r elatedness and similarity by five annotators. We evaluate three different models (Word2Vec, fastText, and GloVe) trained on two separate Swedish datasets, namely the Swedish Gigaword corpus and a Swedish Wikipedia dump, to provide a baseline for future comparison. We will release the fully annotated test set, code, models, and data.

test set swedish set مجموعة الاختبار السويدية صناعة حمض الفوسفور

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

The Swedish Winogender Dataset

DataSet السويدية ينوجندر

Ask ChatGPT about the research

Read More

suggested questions