نقدم مجموعة اختبار Swewinogender، مجموعة بيانات تشخيصية لقياس التحيز بين الجنسين في دقة Aquerence.وهي على غرارها بعد المعيار الإنجليزي الإنجليزي، ويتم إصدارها مع إحصاءات مرجعية بشأن توزيع الرجال والنساء بين المهن والشكام بين الجنسين والاحتلال في مواد الشمال الحديثة.تناقش الورقة تصميم وإنشاء مجموعة البيانات، ويعرض تحقيقا صغيرا في الإحصاءات التكميلية.
We introduce the SweWinogender test set, a diagnostic dataset to measure gender bias in coreference resolution. It is modelled after the English Winogender benchmark, and is released with reference statistics on the distribution of men and women between occupations and the association between gender and occupation in modern corpus material. The paper discusses the design and creation of the dataset, and presents a small investigation of the supplementary statistics.
References used
https://aclanthology.org/
The paper introduces a new resource, CoDeRooMor, for studying the morphology of modern Swedish word formation. The approximately 16.000 lexical items in the resource have been manually segmented into word-formation morphemes, and labeled for their ca
This study introduces an enriched version of the E2E dataset, one of the most popular language resources for data-to-text NLG. We extract intermediate representations for popular pipeline tasks such as discourse ordering, text structuring, lexicaliza
Online abuse can inflict harm on users and communities, making online spaces unsafe and toxic. Progress in automatically detecting and classifying abusive content is often held back by the lack of high quality and detailed datasets.We introduce a new
We train and test five open-source taggers, which use different methods, on three Swedish corpora, which are of comparable size but use different tagsets. The KB-Bert tagger achieves the highest accuracy for part-of-speech and morphological tagging,
Language models are notoriously difficult to evaluate. We release SuperSim, a large-scale similarity and relatedness test set for Swedish built with expert human judgements. The test set is composed of 1,360 word-pairs independently judged for both r