Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Transfer-based Enrichment of a Hungarian Named Entity Dataset

التخصيب القائم على النقل لمجموعة بيانات كيكة مجرية

680 0 0 0.0 ( 0 )

Download Cite

Added by Association for Computation Linguistics مقالة

Publication date 2021

fields Artificial Intelligence

and research's language is English

Created by Shamra Editor

hungarian named entity named entity dataset czech named entity الكيان المجاري المسمى مجموعة بيانات الكيان المسماة Czech اسمي كيان صناعة حمض الفوسفور

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

In this paper, we present a major update to the first Hungarian named entity dataset, the Szeged NER corpus. We used zero-shot cross-lingual transfer to initialize the enrichment of entity types annotated in the corpus using three neural NER models: two of them based on the English OntoNotes corpus and one based on the Czech Named Entity Corpus finetuned from multilingual neural language models. The output of the models was automatically merged with the original NER annotation, and automatically and manually corrected and further enriched with additional annotation, like qualifiers for various entity types. We present the evaluation of the zero-shot performance of the two OntoNotes-based models and a transformer-based new NER model trained on the training part of the final corpus. We release the corpus and the trained model.

References used

https://aclanthology.org/

rate research

WikiAsp: A Dataset for Multi-domain Aspect-based Summarization

529 - Association for Computation Linguistics 2021 مقالة

Abstract Aspect-based summarization is the task of generating focused summaries based on specific points of interest. Such summaries aid efficient analysis of text, such as quickly understanding reviews or opinions from different angles. However, due to large differences in the type of aspects for different domains (e.g., sentiment, product features), the development of previous models has tended to be domain-specific. In this paper, we propose WikiAsp,1 a large-scale dataset for multi-domain aspect- based summarization that attempts to spur research in the direction of open-domain aspect-based summarization. Specifically, we build the dataset using Wikipedia articles from 20 different domains, using the section titles and boundaries of each article as a proxy for aspect annotation. We propose several straightforward baseline models for this task and conduct experiments on the dataset. Results highlight key challenges that existing summarization models face in this setting, such as proper pronoun handling of quoted sources and consistent explanation of time-sensitive events.

aspect-based summarization multi-domain aspect-based summarization abstract aspect-based summarization تلخيص القائم على الجانب تلخيص القائم على الجانب متعدد المجالات التلخيص القائم على الجانب المجردة صناعة حمض الفوسفور المزيد..

TR-SEQ: Named Entity Recognition Dataset for Turkish Search Engine Queries

809 - Association for Computation Linguistics 2021 مقالة

Recognizing named entities in short search engine queries is a difficult task due to their weaker contextual information compared to long sentences. Standard named entity recognition (NER) systems that are trained on grammatically correct and long se ntences fail to perform well on such queries. In this study, we share our efforts towards creating a cleaned and labeled dataset of real Turkish search engine queries (TR-SEQ) and introduce an extended label set to satisfy the search engine needs. A NER system is trained by applying the state-of-the-art deep learning method BERT to the collected data and its high performance on search engine queries is reported. Moreover, we compare our results with the state-of-the-art Turkish NER systems.

search engine queries استفسارات محرك البحث صناعة حمض الفوسفور

Transfer Learning for Czech Historical Named Entity Recognition

595 - Association for Computation Linguistics 2021 مقالة

Nowadays, named entity recognition (NER) achieved excellent results on the standard corpora. However, big issues are emerging with a need for an application in a specific domain, because it requires a suitable annotated corpus with adapted NE tag-set . This is particularly evident in the historical document processing field. The main goal of this paper consists of proposing and evaluation of several transfer learning methods to increase the score of the Czech historical NER. We study several information sources, and we use two neural nets for NE modeling and recognition. We employ two corpora for evaluation of our transfer learning methods, namely Czech named entity corpus and Czech historical named entity corpus. We show that BERT representation with fine-tuning and only the simple classifier trained on the union of corpora achieves excellent results.

أسلوب historical named entity كيان تاريخي الاسم صناعة حمض الفوسفور

Evaluation of Taxonomy Enrichment on Diachronic WordNet Versions

663 - Association for Computation Linguistics 2021 مقالة

The vast majority of the existing approaches for taxonomy enrichment apply word embeddings as they have proven to accumulate contexts (in a broad sense) extracted from texts which are sufficient for attaching orphan words to the taxonomy. On the othe r hand, apart from being large lexical and semantic resources, taxonomies are graph structures. Combining word embeddings with graph structure of taxonomy could be of use for predicting taxonomic relations. In this paper we compare several approaches for attaching new words to the existing taxonomy which are based on the graph representations with the one that relies on fastText embeddings. We test all methods on Russian and English datasets, but they could be also applied to other wordnets and languages.

diachronic wordnet versions enrichment on diachronic wordnet versions إصدارات Wordnet Diachronic. التخصيب على dichronic. إصدارات WordNet. صناعة حمض الفوسفور المزيد..

Clustering-based Inference for Biomedical Entity Linking

562 - Association for Computation Linguistics 2021 مقالة

Due to large number of entities in biomedical knowledge bases, only a small fraction of entities have corresponding labelled training data. This necessitates entity linking models which are able to link mentions of unseen entities using learned repre sentations of entities. Previous approaches link each mention independently, ignoring the relationships within and across documents between the entity mentions. These relations can be very useful for linking mentions in biomedical text where linking decisions are often difficult due mentions having a generic or a highly specialized form. In this paper, we introduce a model in which linking decisions can be made not merely by linking to a knowledge base entity but also by grouping multiple mentions together via clustering and jointly making linking predictions. In experiments on the largest publicly available biomedical dataset, we improve the best independent prediction for entity linking by 3.0 points of accuracy, and our clustering-based inference model further improves entity linking by 2.3 points.

linking entity linking entity الرابط الشبكة الربط صناعة حمض الفوسفور

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Transfer-based Enrichment of a Hungarian Named Entity Dataset

التخصيب القائم على النقل لمجموعة بيانات كيكة مجرية

Ask ChatGPT about the research

Read More

suggested questions