Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Sequence Mixup for Zero-Shot Cross-Lingual Part-Of-Speech Tagging

مزيج التسلسل لعلامة جزء من جزء لا بلغ تلاشى

740 0 0 0.0 ( 0 )

Download Cite

Added by Association for Computation Linguistics مقالة

Publication date 2021

fields Artificial Intelligence

and research's language is English

Created by Shamra Editor

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

There have been efforts in cross-lingual transfer learning for various tasks. We present an approach utilizing an interpolative data augmentation method, Mixup, to improve the generalizability of models for part-of-speech tagging trained on a source language, improving its performance on unseen target languages. Through experiments on ten languages with diverse structures and language roots, we put forward its applicability for downstream zero-shot cross-lingual tasks.

References used

https://aclanthology.org/

rate research

Reducing Confusion in Active Learning for Part-Of-Speech Tagging

650 - Association for Computation Linguistics 2021 مقالة

Active learning (AL) uses a data selection algorithm to select useful training samples to minimize annotation cost. This is now an essential tool for building low-resource syntactic analyzers such as part-of-speech (POS) taggers. Existing AL heuristi cs are generally designed on the principle of selecting uncertain yet representative training instances, where annotating these instances may reduce a large number of errors. However, in an empirical study across six typologically diverse languages (German, Swedish, Galician, North Sami, Persian, and Ukrainian), we found the surprising result that even in an oracle scenario where we know the true uncertainty of predictions, these current heuristics are far from optimal. Based on this analysis, we pose the problem of AL as selecting instances that maximally reduce the confusion between particular pairs of output tags. Extensive experimentation on the aforementioned languages shows that our proposed AL strategy outperforms other AL strategies by a significant margin. We also present auxiliary results demonstrating the importance of proper calibration of models, which we ensure through cross-view training, and analysis demonstrating how our proposed strategy selects examples that more closely follow the oracle data distribution. The code is publicly released here.1

أساور ومطعم reducing confusion tagging تقليل الارتباك وضع علامات صناعة حمض الفوسفور

Part-of-speech tagging of Swedish texts in the neural era

947 - Association for Computation Linguistics 2021 مقالة

We train and test five open-source taggers, which use different methods, on three Swedish corpora, which are of comparable size but use different tagsets. The KB-Bert tagger achieves the highest accuracy for part-of-speech and morphological tagging, while being fast enough for practical use. We also compare the performance across tagsets and across different genres in one of the corpora. We perform manual error analysis and perform a statistical analysis of factors which affect how difficult specific tags are. Finally, we test ensemble methods, showing that a small (but not significant) improvement over the best-performing tagger can be achieved.

neural era swedish texts swedish corpora الحقبة العصبية النصوص السويدية سوريا السويدية صناعة حمض الفوسفور المزيد..

A Pre-trained Transformer and CNN Model with Joint Language ID and Part-of-Speech Tagging for Code-Mixed Social-Media Text

850 - Association for Computation Linguistics 2021 مقالة

Code-mixing (CM) is a frequently observed phenomenon that uses multiple languages in an utterance or sentence. There are no strict grammatical constraints observed in code-mixing, and it consists of non-standard variations of spelling. The linguistic complexity resulting from the above factors made the computational analysis of the code-mixed language a challenging task. Language identification (LI) and part of speech (POS) tagging are the fundamental steps that help analyze the structure of the code-mixed text. Often, the LI and POS tagging tasks are interdependent in the code-mixing scenario. We project the problem of dealing with multilingualism and grammatical structure while analyzing the code-mixed sentence as a joint learning task. In this paper, we jointly train and optimize language detection and part of speech tagging models in the code-mixed scenario. We used a Transformer with convolutional neural network architecture. We train a joint learning method by combining POS tagging and LI models on code-mixed social media text obtained from the ICON shared task.

pre-trained transformer cnn model code-mixed social-media text محول مدرب مسبقا نموذج سي إن إن نص البيانات الاجتماعية المختلطة صناعة حمض الفوسفور المزيد..

A Psychologically Informed Part-of-Speech Analysis of Depression in Social Media

716 - Association for Computation Linguistics 2021 مقالة

In this work, we provide an extensive part-of-speech analysis of the discourse of social media users with depression. Research in psychology revealed that depressed users tend to be self-focused, more preoccupied with themselves and ruminate more abo ut their lives and emotions. Our work aims to make use of large-scale datasets and computational methods for a quantitative exploration of discourse. We use the publicly available depression dataset from the Early Risk Prediction on the Internet Workshop (eRisk) 2018 and extract part-of-speech features and several indices based on them. Our results reveal statistically significant differences between the depressed and non-depressed individuals confirming findings from the existing psychology literature. Our work provides insights regarding the way in which depressed individuals are expressing themselves on social media platforms, allowing for better-informed computational models to help monitor and prevent mental illnesses.

psychologically informed social media users أبلغ نفسيا مستخدمي وسائل التواصل الاجتماعي صناعة حمض الفوسفور

Hydrochemical study of free groundwater in a part of Damascus's Ghouta

2651 - Aِl-Baath University 2014 ورقة بحثية

An available free groundwater were classified hydrochemical, and determined its fitting for general uses in a part of Damascus's Ghouta, which suffers water deficiency, where the study carried out on groundwater samples taken from 20 wells distri buted on all study area, the results show that groundwater classified hydrochemical as calcic water and non potable, and unsuitable for domestic consumption in broad part from study area, but arable and recommend to use in irrigation plants that have weak resistant to salinity, and not preferable to use in industry generally, but it's good for building and concrete works.

المياه الجوفية Groundwater Hydrochemical Damascus's Ghouta هيدروكيميائية غوطة دمشق

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Sequence Mixup for Zero-Shot Cross-Lingual Part-Of-Speech Tagging

مزيج التسلسل لعلامة جزء من جزء لا بلغ تلاشى

Ask ChatGPT about the research

Read More

suggested questions