Do you want to publish a course? Click here

Taboo Wordnet

Taboo Wordnet.

301   0   0   0.0 ( 0 )
 Publication date 2021
and research's language is English
 Created by Shamra Editor




Ask ChatGPT about the research

This paper describes the development of an online lexical resource to help detection systems regulate and curb the use of offensive words online. With the growing prevalence of social media platforms, many conversations are now conducted on- line. The increase of online conversations for leisure, work and socializing has led to an increase in harassment. In particular, we create a specialized sense-based vocabulary of Japanese offensive words for the Open Multilingual Wordnet. This vocabulary expands on an existing list of Japanese offen- sive words and provides categorization and proper linking to synsets within the multilingual wordnet. This paper then discusses the evaluation of the vocabulary as a resource for representing and classifying offensive words and as a possible resource for offensive word use detection in social media.



References used
https://aclanthology.org/
rate research

Read More

Currently, there are two available wordnets for Turkish: TR-wordnet of BalkaNet and KeNet. As the more comprehensive wordnet for Turkish, KeNet includes 76,757 synsets. KeNet has both intralingual semantic relations and is linked to PWN through inter lingual relations. In this paper, we present the procedure adopted in creating KeNet, give details about our approach in annotating semantic relations such as hypernymy and discuss the language-specific problems encountered in these processes.
The paper presents the project Semantic Network with a Wide Range of Semantic Relations and its main achievements. The ultimate objective of the project is to expand Princeton WordNet with conceptual frames that define the syntagmatic relations of ve rb synsets and the semantic classes of nouns felicitous to combine with particular verbs. At this stage of the work: a) over 5,000 WordNet verb synsets have been supplied with manually evaluated FrameNet semantic frames, b) 253 semantic types have been manually mapped to the appropriate WordNet concepts providing detailed ontological representation of the semantic classes of nouns.
WordNet is the most widely used lexical resource for English, while Wikidata is one of the largest knowledge graphs of entity and concepts available. While, there is a clear difference in the focus of these two resources, there is also a significant overlap and as such a complete linking of these resources would have many uses. We propose the development of such a linking, first by means of the hapax legomenon links and secondly by the use of natural language processing techniques. We show that these can be done with high accuracy but that human validation is still necessary. This has resulted in over 9,000 links being added between these two resources.
The vast majority of the existing approaches for taxonomy enrichment apply word embeddings as they have proven to accumulate contexts (in a broad sense) extracted from texts which are sufficient for attaching orphan words to the taxonomy. On the othe r hand, apart from being large lexical and semantic resources, taxonomies are graph structures. Combining word embeddings with graph structure of taxonomy could be of use for predicting taxonomic relations. In this paper we compare several approaches for attaching new words to the existing taxonomy which are based on the graph representations with the one that relies on fastText embeddings. We test all methods on Russian and English datasets, but they could be also applied to other wordnets and languages.
Neural language models, including transformer-based models, that are pre-trained on very large corpora became a common way to represent text in various tasks, including recognition of textual semantic relations, e.g. Cross-document Structure Theory. Pre-trained models are usually fine tuned to downstream tasks and the obtained vectors are used as an input for deep neural classifiers. No linguistic knowledge obtained from resources and tools is utilised. In this paper we compare such universal approaches with a combination of rich graph-based linguistically motivated sentence representation and a typical neural network classifier applied to a task of recognition of CST relation in Polish. The representation describes selected levels of the sentence structure including description of lexical meanings on the basis of the wordnet (plWordNet) synsets and connected SUMO concepts. The obtained results show that in the case of difficult relations and medium size training corpus semantically enriched text representation leads to significantly better results.

suggested questions

comments
Fetching comments Fetching comments
Sign in to be able to follow your search criteria
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا