Do you want to publish a course? Click here

Neural Language Models vs Wordnet-based Semantically Enriched Representation in CST Relation Recognition

نماذج اللغة العصبية مقابل تمثيل Wordnet-القائم على WordNet المخصب في CST

339   0   0   0.0 ( 0 )
 Publication date 2021
and research's language is English
 Created by Shamra Editor




Ask ChatGPT about the research

Neural language models, including transformer-based models, that are pre-trained on very large corpora became a common way to represent text in various tasks, including recognition of textual semantic relations, e.g. Cross-document Structure Theory. Pre-trained models are usually fine tuned to downstream tasks and the obtained vectors are used as an input for deep neural classifiers. No linguistic knowledge obtained from resources and tools is utilised. In this paper we compare such universal approaches with a combination of rich graph-based linguistically motivated sentence representation and a typical neural network classifier applied to a task of recognition of CST relation in Polish. The representation describes selected levels of the sentence structure including description of lexical meanings on the basis of the wordnet (plWordNet) synsets and connected SUMO concepts. The obtained results show that in the case of difficult relations and medium size training corpus semantically enriched text representation leads to significantly better results.



References used
https://aclanthology.org/
rate research

Read More

In the paper, we deal with the problem of unsupervised text document clustering for the Polish language. Our goal is to compare the modern approaches based on language modeling (doc2vec and BERT) with the classical ones, i.e., TF-IDF and wordnet-base d. The experiments are conducted on three datasets containing qualification descriptions. The experiments' results showed that wordnet-based similarity measures could compete and even outperform modern embedding-based approaches.
This paper describes the development of an online lexical resource to help detection systems regulate and curb the use of offensive words online. With the growing prevalence of social media platforms, many conversations are now conducted on- line. Th e increase of online conversations for leisure, work and socializing has led to an increase in harassment. In particular, we create a specialized sense-based vocabulary of Japanese offensive words for the Open Multilingual Wordnet. This vocabulary expands on an existing list of Japanese offen- sive words and provides categorization and proper linking to synsets within the multilingual wordnet. This paper then discusses the evaluation of the vocabulary as a resource for representing and classifying offensive words and as a possible resource for offensive word use detection in social media.
Relational knowledge bases (KBs) are commonly used to represent world knowledge in machines. However, while advantageous for their high degree of precision and interpretability, KBs are usually organized according to manually-defined schemas, which l imit their expressiveness and require significant human efforts to engineer and maintain. In this review, we take a natural language processing perspective to these limitations, examining how they may be addressed in part by training deep contextual language models (LMs) to internalize and express relational knowledge in more flexible forms. We propose to organize knowledge representation strategies in LMs by the level of KB supervision provided, from no KB supervision at all to entity- and relation-level supervision. Our contributions are threefold: (1) We provide a high-level, extensible taxonomy for knowledge representation in LMs; (2) Within our taxonomy, we highlight notable models, evaluation tasks, and findings, in order to provide an up-to-date review of current knowledge representation capabilities in LMs; and (3) We suggest future research directions that build upon the complementary aspects of LMs and KBs as knowledge representations.
Knowledge-enriched text generation poses unique challenges in modeling and learning, driving active research in several core directions, ranging from integrated modeling of neural representations and symbolic information in the sequential/hierarchica l/graphical structures, learning without direct supervisions due to the cost of structured annotation, efficient optimization and inference with massive and global constraints, to language grounding on multiple modalities, and generative reasoning with implicit commonsense knowledge and background knowledge. In this tutorial we will present a roadmap to line up the state-of-the-art methods to tackle these challenges on this cutting-edge problem. We will dive deep into various technical components: how to represent knowledge, how to feed knowledge into a generation model, how to evaluate generation results, and what are the remaining challenges?
Currently, there are two available wordnets for Turkish: TR-wordnet of BalkaNet and KeNet. As the more comprehensive wordnet for Turkish, KeNet includes 76,757 synsets. KeNet has both intralingual semantic relations and is linked to PWN through inter lingual relations. In this paper, we present the procedure adopted in creating KeNet, give details about our approach in annotating semantic relations such as hypernymy and discuss the language-specific problems encountered in these processes.

suggested questions

comments
Fetching comments Fetching comments
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا