نحن ندرب ونختبر خمسة علامات مفتوحة المصدر، والتي تستخدم أساليب مختلفة، على ثلاثة كوربورا السويدية، وهي ذات حجم مماثل ولكن استخدام أشكال مختلفة.يحقق Tagger KB-Bert Tagger أعلى دقة لعلامات جزء من الكلام والمورفولوجية، بينما تكون سريعة بما يكفي للاستخدام العملي.نحن نقارن أيضا الأداء عبر الأشرقة وعبر الأنواع المختلفة في إحدى الشركات.نقوم بإجراء تحليل الأخطاء اليدوي وأداء تحليل إحصائي للعوامل التي تؤثر على مدى صعوبة علامات محددة.أخيرا، نقوم باختبار أساليب الفرقة، والتي تبين أن التحسن الصغير (ولكن غير مهم) على أفضل أداء يمكن تحقيقه.
We train and test five open-source taggers, which use different methods, on three Swedish corpora, which are of comparable size but use different tagsets. The KB-Bert tagger achieves the highest accuracy for part-of-speech and morphological tagging, while being fast enough for practical use. We also compare the performance across tagsets and across different genres in one of the corpora. We perform manual error analysis and perform a statistical analysis of factors which affect how difficult specific tags are. Finally, we test ensemble methods, showing that a small (but not significant) improvement over the best-performing tagger can be achieved.
References used
https://aclanthology.org/
Active learning (AL) uses a data selection algorithm to select useful training samples to minimize annotation cost. This is now an essential tool for building low-resource syntactic analyzers such as part-of-speech (POS) taggers. Existing AL heuristi
There have been efforts in cross-lingual transfer learning for various tasks. We present an approach utilizing an interpolative data augmentation method, Mixup, to improve the generalizability of models for part-of-speech tagging trained on a source
In this work, we provide an extensive part-of-speech analysis of the discourse of social media users with depression. Research in psychology revealed that depressed users tend to be self-focused, more preoccupied with themselves and ruminate more abo
Basic-level terms have been described as the most important to human categorisation. They are the earliest emerging words in children's language acquisition, and seem to be more frequently occurring in language in general. In this article, we explore
Code-mixing (CM) is a frequently observed phenomenon that uses multiple languages in an utterance or sentence. There are no strict grammatical constraints observed in code-mixing, and it consists of non-standard variations of spelling. The linguistic