Research papers, master and doctoral theses about نظرية هيكل الوثائق عبر المستندات

Neural Language Models vs Wordnet-based Semantically Enriched Representation in CST Relation Recognition

243 - Association for Computation Linguistics 2021 مقالة

Neural language models, including transformer-based models, that are pre-trained on very large corpora became a common way to represent text in various tasks, including recognition of textual semantic relations, e.g. Cross-document Structure Theory. Pre-trained models are usually fine tuned to downstream tasks and the obtained vectors are used as an input for deep neural classifiers. No linguistic knowledge obtained from resources and tools is utilised. In this paper we compare such universal approaches with a combination of rich graph-based linguistically motivated sentence representation and a typical neural network classifier applied to a task of recognition of CST relation in Polish. The representation describes selected levels of the sentence structure including description of lexical meanings on the basis of the wordnet (plWordNet) synsets and connected SUMO concepts. The obtained results show that in the case of difficult relations and medium size training corpus semantically enriched text representation leads to significantly better results.

تجميع المستندات wordnet-based semantically enriched cross-document structure theory WordNet- تخصيص مخصب دلالة نظرية هيكل الوثائق عبر المستندات صناعة حمض الفوسفور

What is on Social Media that is not in WordNet? A Preliminary Analysis on the TwitterAAE Corpus

202 - Association for Computation Linguistics 2021 مقالة

Natural Language Processing tools and resources have been so far mainly created and trained for standard varieties of language. Nowadays, with the use of large amounts of data gathered from social media, other varieties and registers need to be proce ssed, which may present other challenges and difficulties. In this work, we focus on English and we present a preliminary analysis by comparing the TwitterAAE corpus, which is annotated for ethnicity, and WordNet by quantifying and explaining the online language that WordNet misses.

نظرية هيكل الوثائق عبر المستندات preliminary analysis twitteraae corpus تحليل أولي Twitteraae Corpus. صناعة حمض الفوسفور

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد