New community

Subscribe to the gold package and get unlimited access to Shamra Academy

SuperSim: a test set for word similarity and relatedness in Swedish

Supersim: مجموعة اختبار لمجموعة التشابه والترابط في السويدية

305 0 0 0.0 ( 0 )

Download Cite

Added by Association for Computation Linguistics مقالة

Publication date 2021

fields Artificial Intelligence

and research's language is English

Created by Shamra Editor

test set swedish set مجموعة الاختبار السويدية صناعة حمض الفوسفور

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Language models are notoriously difficult to evaluate. We release SuperSim, a large-scale similarity and relatedness test set for Swedish built with expert human judgements. The test set is composed of 1,360 word-pairs independently judged for both relatedness and similarity by five annotators. We evaluate three different models (Word2Vec, fastText, and GloVe) trained on two separate Swedish datasets, namely the Swedish Gigaword corpus and a Swedish Wikipedia dump, to provide a baseline for future comparison. We will release the fully annotated test set, code, models, and data.

References used

https://aclanthology.org/

rate research

272 - Association for Computation Linguistics 2021 مقالة

Multiple-choice questions (MCQs) are widely used in knowledge assessment in educational institutions, during work interviews, in entertainment quizzes and games. Although the research on the automatic or semi-automatic generation of multiple-choice t est items has been conducted since the beginning of this millennium, most approaches focus on generating questions from a single sentence. In this research, a state-of-the-art method of creating questions based on multiple sentences is introduced. It was inspired by semantic similarity matches used in the translation memory component of translation management systems. The performance of two deep learning algorithms, doc2vec and SBERT, is compared for the paragraph similarity task. The experiments are performed on the ad-hoc corpus within the EU domain. For the automatic evaluation, a smaller corpus of manually selected matching paragraphs has been compiled. The results prove the good performance of Sentence Embeddings for the given task.

multiple-choice test items generating multiple-choice test multiple-choice test عناصر اختبار متعددة الخيارات توليد اختبار متعدد الخيارات تىسىؤابىؤاللارتبؤتي صناعة حمض الفوسفور المزيد..

Second Order WinoBias (SoWinoBias) Test Set for Latent Gender Bias Detection in Coreference Resolution

198 - Association for Computation Linguistics 2021 مقالة

We observe an instance of gender-induced bias in a downstream application, despite the absence of explicit gender words in the test cases. We provide a test set, SoWinoBias, for the purpose of measuring such latent gender bias in coreference resoluti on systems. We evaluate the performance of current debiasing methods on the SoWinoBias test set, especially in reference to the method's design and altered embedding space properties. See https://github.com/hillary-dawkins/SoWinoBias.

gender bias detection latent gender bias order winobias كشف التحيز بين الجنسين التحيز الجنساني الكامن طلب ينبيا صناعة حمض الفوسفور المزيد..

CoDeRooMor: A new dataset for non-inflectional morphology studies of Swedish

403 - Association for Computation Linguistics 2021 مقالة

The paper introduces a new resource, CoDeRooMor, for studying the morphology of modern Swedish word formation. The approximately 16.000 lexical items in the resource have been manually segmented into word-formation morphemes, and labeled for their ca tegories, such as prefixes, suffixes, roots, etc. Word-formation mechanisms, such as derivation and compounding have been associated with each item on the list. The article describes the selection of items for manual annotation and the principles of annotation, reports on the reliability of the manual annotation, and presents tools, resources and some first statistics. Given the''gold'' nature of the resource, it is possible to use it for empirical studies as well as to develop linguistically-aware algorithms for morpheme segmentation and labeling (cf statistical subword approach). The resource will be made freely available.

non-inflectional morphology studies swedish word formation modern swedish word دراسات التشكل غير الانهيار تكوين كلمة سويدية الكلمة السويدية الحديثة صناعة حمض الفوسفور المزيد..

Looking for a Role for Word Embeddings in Eye-Tracking Features Prediction: Does Semantic Similarity Help?

399 - Association for Computation Linguistics 2021 مقالة

Eye-tracking psycholinguistic studies have suggested that context-word semantic coherence and predictability influence language processing during the reading activity. In this study, we investigate the correlation between the cosine similarities comp uted with word embedding models (both static and contextualized) and eye-tracking data from two naturalistic reading corpora. We also studied the correlations of surprisal scores computed with three state-of-the-art language models. Our results show strong correlation for the scores computed with BERT and GloVe, suggesting that similarity can play an important role in modeling reading times.

eye-tracking features prediction features prediction eye-tracking features ميزات تتبع العين التنبؤ ميزات التنبؤ ميزات تتبع العين صناعة حمض الفوسفور المزيد..

Knowledge Distillation for Swedish NER models: A Search for Performance and Efficiency

533 - Association for Computation Linguistics 2021 مقالة

The current recipe for better model performance within NLP is to increase model size and training data. While it gives us models with increasingly impressive results, it also makes it more difficult to train and deploy state-of-the-art models for NLP due to increasing computational costs. Model compression is a field of research that aims to alleviate this problem. The field encompasses different methods that aim to preserve the performance of a model while decreasing the size of it. One such method is knowledge distillation. In this article, we investigate the effect of knowledge distillation for named entity recognition models in Swedish. We show that while some sequence tagging models benefit from knowledge distillation, not all models do. This prompts us to ask questions about in which situations and for which models knowledge distillation is beneficial. We also reason about the effect of knowledge distillation on computational costs.

swedish ner models swedish ner نماذج نير السويدية سويدية نير صناعة حمض الفوسفور

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

SuperSim: a test set for word similarity and relatedness in Swedish

Supersim: مجموعة اختبار لمجموعة التشابه والترابط في السويدية

Ask ChatGPT about the research

Read More

suggested questions