Do you want to publish a course? Click here

Automatic Entity State Annotation using the VerbNet Semantic Parser

شرح حالة الكيان التلقائي باستخدام المحلل الدلالي Verbnet

458   0   0   0.0 ( 0 )
 Publication date 2021
and research's language is English
 Created by Shamra Editor




Ask ChatGPT about the research

Tracking entity states is a natural language processing task assumed to require human annotation. In order to reduce the time and expenses associated with annotation, we introduce a new method to automatically extract entity states, including location and existence state of entities, following Dalvi et al. (2018) and Tandon et al. (2020). For this purpose, we rely primarily on the semantic representations generated by the state of the art VerbNet parser (Gung, 2020), and extract the entities (event participants) and their states, based on the semantic predicates of the generated VerbNet semantic representation, which is in propositional logic format. For evaluation, we used ProPara (Dalvi et al., 2018), a reading comprehension dataset which is annotated with entity states in each sentence, and tracks those states in paragraphs of natural human-authored procedural texts. Given the presented limitations of the method, the peculiarities of the ProPara dataset annotations, and that our system, Lexis, makes no use of task-specific training data and relies solely on VerbNet, the results are promising, showcasing the value of lexical resources.



References used
https://aclanthology.org/
rate research

Read More

Despite recent advances in semantic role labeling propelled by pre-trained text encoders like BERT, performance lags behind when applied to predicates observed infrequently during training or to sentences in new domains. In this work, we investigate how role labeling performance on low-frequency predicates and out-of-domain data can be further improved by using VerbNet, a verb lexicon that groups verbs into hierarchical classes based on shared syntactic and semantic behavior and defines semantic representations describing relations between arguments. We find that VerbNet classes provide an effective level of abstraction, improving generalization on low-frequency predicates by allowing them to learn from the training examples of other predicates belonging to the same class. We also find that joint training of VerbNet role labeling and predicate disambiguation of VerbNet classes for polysemous verbs leads to improvements in both tasks, naturally supporting the extraction of VerbNet's semantic representations.
In this paper, we present the results of our experiments concerning the zero-shot cross-lingual performance of the PERIN sentence-to-graph semantic parser. We applied the PTG model trained using the PERIN parser on a 740k-token Czech newspaper corpus to Hungarian. We evaluated the performance of the parser using the official evaluation tool of the MRP 2020 shared task. The gold standard Hungarian annotation was created by manual correction of the output of the parser following the annotation manual of the tectogrammatical level of the Prague Dependency Treebank. An English model trained on a larger one-million-token English newspaper corpus is also available, however, we found that the Czech model performed significantly better on Hungarian input due to the fact that Hungarian is typologically more similar to Czech than to English. We have found that zero-shot transfer of the PTG meaning representation across typologically not-too-distant languages using a neural parser model based on a multilingual contextual language model followed by a manual correction by linguist experts seems to be a viable scenario.
Computational resources such as semantically annotated corpora can play an important role in enabling speakers of indigenous minority languages to participate in government, education, and other domains of public life in their own language. However, many languages -- mainly those with small native speaker populations and without written traditions -- have little to no digital support. One hurdle in creating such resources is that for many languages, few speakers would be capable of annotating texts -- a task which requires literacy and some linguistic training -- and that these experts' time is typically in high demand for language planning work. This paper assesses whether typologically trained non-speakers of an indigenous language can feasibly perform semantic annotation using Uniform Meaning Representations, thus allowing for the creation of computational materials without putting further strain on community resources.
In the few recent years, besides the traditional web a new web has appeared. It is called the Web of Linked Data. It has been developed to present data in a machinereadable form. The main idea is to describe data using a set of terms called web ont ology. At this time, tools and standards related to the semantic web are becoming comprehensive and stable; however, publishing university data as linked data still faces some major challenges. First of all, there is no unified, well-accepted vocabulary for describing university-related information. This article aims to find the ontology which could be used to describe the data in the university domain, so it could be possible to integrate this data with data from other universities and do queries on it. The web ontology was built by reusing the available vocabularies on the web and adding new classes and properties. The ontology has been organized by using Protégé.
While powerful pre-trained language models have improved the fluency of text generation models, semantic adequacy -the ability to generate text that is semantically faithful to the input- remains an unsolved issue. In this paper, we introduce a novel automatic evaluation metric, Entity-Based Semantic Adequacy, which can be used to assess to what extent generation models that verbalise RDF (Resource Description Framework) graphs produce text that contains mentions of the entities occurring in the RDF input. This is important as RDF subject and object entities make up 2/3 of the input. We use our metric to compare 25 models from the WebNLG Shared Tasks and we examine correlation with results from human evaluations of semantic adequacy. We show that while our metric correlates with human evaluation scores, this correlation varies with the specifics of the human evaluation setup. This suggests that in order to measure the entity-based adequacy of generated texts, an automatic metric such as the one proposed here might be more reliable, as less subjective and more focused on correct verbalisation of the input, than human evaluation measures.

suggested questions

comments
Fetching comments Fetching comments
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا