New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Part-of-speech tagging of Swedish texts in the neural era

جزء من الكلام العلامات من النصوص السويدية في الحقبة العصبية

555 0 0 0.0 ( 0 )

Download Cite

Added by Association for Computation Linguistics مقالة

Publication date 2021

fields Artificial Intelligence

and research's language is English

Created by Shamra Editor

neural era swedish texts swedish corpora الحقبة العصبية النصوص السويدية سوريا السويدية صناعة حمض الفوسفور

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

We train and test five open-source taggers, which use different methods, on three Swedish corpora, which are of comparable size but use different tagsets. The KB-Bert tagger achieves the highest accuracy for part-of-speech and morphological tagging, while being fast enough for practical use. We also compare the performance across tagsets and across different genres in one of the corpora. We perform manual error analysis and perform a statistical analysis of factors which affect how difficult specific tags are. Finally, we test ensemble methods, showing that a small (but not significant) improvement over the best-performing tagger can be achieved.

References used

https://aclanthology.org/

rate research

Reducing Confusion in Active Learning for Part-Of-Speech Tagging

283 - Association for Computation Linguistics 2021 مقالة

Active learning (AL) uses a data selection algorithm to select useful training samples to minimize annotation cost. This is now an essential tool for building low-resource syntactic analyzers such as part-of-speech (POS) taggers. Existing AL heuristi cs are generally designed on the principle of selecting uncertain yet representative training instances, where annotating these instances may reduce a large number of errors. However, in an empirical study across six typologically diverse languages (German, Swedish, Galician, North Sami, Persian, and Ukrainian), we found the surprising result that even in an oracle scenario where we know the true uncertainty of predictions, these current heuristics are far from optimal. Based on this analysis, we pose the problem of AL as selecting instances that maximally reduce the confusion between particular pairs of output tags. Extensive experimentation on the aforementioned languages shows that our proposed AL strategy outperforms other AL strategies by a significant margin. We also present auxiliary results demonstrating the importance of proper calibration of models, which we ensure through cross-view training, and analysis demonstrating how our proposed strategy selects examples that more closely follow the oracle data distribution. The code is publicly released here.1

أساور ومطعم reducing confusion tagging تقليل الارتباك وضع علامات صناعة حمض الفوسفور

Sequence Mixup for Zero-Shot Cross-Lingual Part-Of-Speech Tagging

368 - Association for Computation Linguistics 2021 مقالة

There have been efforts in cross-lingual transfer learning for various tasks. We present an approach utilizing an interpolative data augmentation method, Mixup, to improve the generalizability of models for part-of-speech tagging trained on a source language, improving its performance on unseen target languages. Through experiments on ten languages with diverse structures and language roots, we put forward its applicability for downstream zero-shot cross-lingual tasks.

sequence mixup zero-shot cross-lingual مزيج التسلسل صفر النار عبر اللغات صناعة حمض الفوسفور

A Psychologically Informed Part-of-Speech Analysis of Depression in Social Media

405 - Association for Computation Linguistics 2021 مقالة

In this work, we provide an extensive part-of-speech analysis of the discourse of social media users with depression. Research in psychology revealed that depressed users tend to be self-focused, more preoccupied with themselves and ruminate more abo ut their lives and emotions. Our work aims to make use of large-scale datasets and computational methods for a quantitative exploration of discourse. We use the publicly available depression dataset from the Early Risk Prediction on the Internet Workshop (eRisk) 2018 and extract part-of-speech features and several indices based on them. Our results reveal statistically significant differences between the depressed and non-depressed individuals confirming findings from the existing psychology literature. Our work provides insights regarding the way in which depressed individuals are expressing themselves on social media platforms, allowing for better-informed computational models to help monitor and prevent mental illnesses.

psychologically informed social media users أبلغ نفسيا مستخدمي وسائل التواصل الاجتماعي صناعة حمض الفوسفور

Synonym Replacement based on a Study of Basic-level Nouns in Swedish Texts of Different Complexity

411 - Association for Computation Linguistics 2021 مقالة

Basic-level terms have been described as the most important to human categorisation. They are the earliest emerging words in children's language acquisition, and seem to be more frequently occurring in language in general. In this article, we explore d the use of basic-level nouns in texts of different complexity, and hypothesise that hypernyms with characteristics of basic-level words could be useful for the task of lexical simplification. We conducted two corpus studies using four different corpora, two corpora of standard Swedish and two corpora of simple Swedish, and explored whether corpora of simple texts contain a higher proportion of basic-level nouns than corpora of standard Swedish. Based on insights from the corpus studies, we developed a novel algorithm for choosing the best synonym by rewarding high relative frequencies and monolexemity, and restricting the climb in the word hierarchy not to suggest synonyms of a too high level of inclusiveness.

synonym replacement based study of basic-level basic-level nouns استبدال مرادف القائمة دراسة المستوى الأساسي الأسماء الأساسية صناعة حمض الفوسفور المزيد..

A Pre-trained Transformer and CNN Model with Joint Language ID and Part-of-Speech Tagging for Code-Mixed Social-Media Text

396 - Association for Computation Linguistics 2021 مقالة

Code-mixing (CM) is a frequently observed phenomenon that uses multiple languages in an utterance or sentence. There are no strict grammatical constraints observed in code-mixing, and it consists of non-standard variations of spelling. The linguistic complexity resulting from the above factors made the computational analysis of the code-mixed language a challenging task. Language identification (LI) and part of speech (POS) tagging are the fundamental steps that help analyze the structure of the code-mixed text. Often, the LI and POS tagging tasks are interdependent in the code-mixing scenario. We project the problem of dealing with multilingualism and grammatical structure while analyzing the code-mixed sentence as a joint learning task. In this paper, we jointly train and optimize language detection and part of speech tagging models in the code-mixed scenario. We used a Transformer with convolutional neural network architecture. We train a joint learning method by combining POS tagging and LI models on code-mixed social media text obtained from the ICON shared task.

pre-trained transformer cnn model code-mixed social-media text محول مدرب مسبقا نموذج سي إن إن نص البيانات الاجتماعية المختلطة صناعة حمض الفوسفور المزيد..

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Part-of-speech tagging of Swedish texts in the neural era

جزء من الكلام العلامات من النصوص السويدية في الحقبة العصبية

Ask ChatGPT about the research

Read More

suggested questions