Do you want to publish a course? Click here

Automatic Classification of Attributes in German Adjective-Noun Phrases

التصنيف التلقائي للسمات في عبارات الأسماء الألمانية

304   0   0   0.0 ( 0 )
 Publication date 2021
and research's language is English
 Created by Shamra Editor




Ask ChatGPT about the research

Adjectives such as heavy (as in heavy rain) and windy (as in windy day) provide possible values for the attributes intensity and climate, respectively. The attributes themselves are not overtly realized and are in this sense implicit. While these attributes can be easily inferred by humans, their automatic classification poses a challenging task for computational models. We present the following contributions: (1) We gain new insights into the attribute selection task for German. More specifically, we develop computational models for this task that are able to generalize to unseen data. Moreover, we show that classification accuracy depends, inter alia, on the degree of polysemy of the lexemes involved, on the generalization potential of the training data and on the degree of semantic transparency of the adjective-noun pairs in question. (2) We provide the first resource for computational and linguistic experiments with German adjective-noun pairs that can be used for attribute selection and related tasks. In order to safeguard against unwelcome memorization effects, we present an automatic data augmentation method based on a lexical resource that can increase the size of the training data to a large extent.



References used
https://aclanthology.org/
rate research

Read More

Predicting the difficulty of domain-specific vocabulary is an important task towards a better understanding of a domain, and to enhance the communication between lay people and experts. We investigate German closed noun compounds and focus on the int eraction of compound-based lexical features (such as frequency and productivity) and terminology-based features (contrasting domain-specific and general language) across word representations and classifiers. Our prediction experiments complement insights from classification using (a) manually designed features to characterise termhood and compound formation and (b) compound and constituent word embeddings. We find that for a broad binary distinction into easy' vs. difficult' general-language compound frequency is sufficient, but for a more fine-grained four-class distinction it is crucial to include contrastive termhood features and compound and constituent features.
Neutralisation techniques, e.g. denial of responsibility and denial of victim, are used in the narrative of climate change scepticism to justify lack of action or to promote an alternative view. We first draw on social science to introduce the proble m to the community of nlp, present the granularity of the coding schema and then collect manual annotations of neutralised techniques in text relating to climate change, and experiment with supervised and semi- supervised BERT-based models.
We present a data set consisting of German news articles labeled for political bias on a five-point scale in a semi-supervised way. While earlier work on hyperpartisan news detection uses binary classification (i.e., hyperpartisan or not) and English data, we argue for a more fine-grained classification, covering the full political spectrum (i.e., far-left, left, centre, right, far-right) and for extending research to German data. Understanding political bias helps in accurately detecting hate speech and online abuse. We experiment with different classification methods for political bias detection. Their comparatively low performance (a macro-F1 of 43 for our best setup, compared to a macro-F1 of 79 for the binary classification task) underlines the need for more (balanced) data annotated in a fine-grained way.
Video Question Answering (VidQA) evaluation metrics have been limited to a single-word answer or selecting a phrase from a fixed set of phrases. These metrics limit the VidQA models' application scenario. In this work, we leverage semantic roles deri ved from video descriptions to mask out certain phrases, to introduce VidQAP which poses VidQA as a fill-in-the-phrase task. To enable evaluation of answer phrases, we compute the relative improvement of the predicted answer compared to an empty string. To reduce the influence of language bias in VidQA datasets, we retrieve a video having a different answer for the same question. To facilitate research, we construct ActivityNet-SRL-QA and Charades-SRL-QA and benchmark them by extending three vision-language models. We perform extensive analysis and ablative studies to guide future work. Code and data are public.
Pretrained language models (PTLMs) yield state-of-the-art performance on many natural language processing tasks, including syntax, semantics and commonsense. In this paper, we focus on identifying to what extent do PTLMs capture semantic attributes a nd their values, e.g., the correlation between rich and high net worth. We use PTLMs to predict masked tokens using patterns and lists of items from Wikidata in order to verify how likely PTLMs encode semantic attributes along with their values. Such inferences based on semantics are intuitive for humans as part of our language understanding. Since PTLMs are trained on large amount of Wikipedia data we would assume that they can generate similar predictions, yet our findings reveal that PTLMs are still much worse than humans on this task. We show evidence and analysis explaining how to exploit our methodology to integrate better context and semantics into PTLMs using knowledge bases.

suggested questions

comments
Fetching comments Fetching comments
Sign in to be able to follow your search criteria
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا