بحث متقدم مدعوم من الذكاء الصنعي

مساحة جديدة

اشترك بالحزمة الذهبية واحصل على وصول غير محدود شمرا أكاديميا

تسجيل مستخدم جديد

The Parallel Meaning Bank: A Framework for Semantically Annotating Multiple Languages

66 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Lasha Abzianidze

تاريخ النشر 2020

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Lasha Abzianidze - Rik van Noord - Chunliu Wang

الحساب واللغة

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

This paper gives a general description of the ideas behind the Parallel Meaning Bank, a framework with the aim to provide an easy way to annotate compositional semantics for texts written in languages other than English. The annotation procedure is semi-automatic, and comprises seven layers of linguistic information: segmentation, symbolisation, semantic tagging, word sense disambiguation, syntactic structure, thematic role labelling, and co-reference. New languages can be added to the meaning bank as long as the documents are based on translations from English, but also introduce new interesting challenges on the linguistics assumptions underlying the Parallel Meaning Bank.

قيم البحث

87 - Arman Kabiri , Paul Cook 2020

Most prior work on definition modeling has not accounted for polysemy, or has done so by considering definition modeling for a target word in a given context. In contrast, in this study, we propose a context-agnostic approach to definition modeling, based on multi-sense word embeddings, that is capable of generating multiple definitions for a target word. In further, contrast to most prior work, which has primarily focused on English, we evaluate our proposed approach on fifteen different datasets covering nine languages from several language families. To evaluate our approach we consider several variations of BLEU. Our results demonstrate that our proposed multi-sense model outperforms a single-sense model on all fifteen datasets.

الحساب واللغة التعلم الآلي

Strong Baselines for Complex Word Identification across Multiple Languages

97 - Pierre Finnimore , Elisabeth Fritzsch , Daniel King 2019

Complex Word Identification (CWI) is the task of identifying which words or phrases in a sentence are difficult to understand by a target audience. The latest CWI Shared Task released data for two settings: monolingual (i.e. train and test in the sam e language) and cross-lingual (i.e. test in a language not seen during training). The best monolingual models relied on language-dependent features, which do not generalise in the cross-lingual setting, while the best cross-lingual model used neural networks with multi-task learning. In this paper, we present monolingual and cross-lingual CWI models that perform as well as (or better than) most models submitted to the latest CWI Shared Task. We show that carefully selected features and simple learning models can achieve state-of-the-art performance, and result in strong baselines for future development in this area. Finally, we discuss how inconsistencies in the annotation of the data can explain some of the results obtained.

الحساب واللغة

BlaBla: Linguistic Feature Extraction for Clinical Analysis in Multiple Languages

147 - Abhishek Shivkumar , Jack Weston , Raphael Lenain 2020

We introduce BlaBla, an open-source Python library for extracting linguistic features with proven clinical relevance to neurological and psychiatric diseases across many languages. BlaBla is a unifying framework for accelerating and simplifying clini cal linguistic research. The library is built on state-of-the-art NLP frameworks and supports multithreaded/GPU-enabled feature extraction via both native Python calls and a command line interface. We describe BlaBlas architecture and clinical validation of its features across 12 diseases. We further demonstrate the application of BlaBla to a task visualizing and classifying language disorders in three languages on real clinical data from the AphasiaBank dataset. We make the codebase freely available to researchers with the hope of providing a consistent, well-validated foundation for the next generation of clinical linguistic research.

الحساب واللغة التعلم الآلي

RTE: A Tool for Annotating Relation Triplets from Text

65 - Ankan Mullick , Animesh Bera , Tapas Nayak 2021

In this work, we present a Web-based annotation tool `Relation Triplets Extractor footnote{https://abera87.github.io/annotate/} (RTE) for annotating relation triplets from the text. Relation extraction is an important task for extracting structured i nformation about real-world entities from the unstructured text available on the Web. In relation extraction, we focus on binary relation that refers to relations between two entities. Recently, many supervised models are proposed to solve this task, but they mostly use noisy training data obtained using the distant supervision method. In many cases, evaluation of the models is also done based on a noisy test dataset. The lack of annotated clean dataset is a key challenge in this area of research. In this work, we built a web-based tool where researchers can annotate datasets for relation extraction on their own very easily. We use a server-less architecture for this tool, and the entire annotation operation is processed using client-side code. Thus it does not suffer from any network latency, and the privacy of the users data is also maintained. We hope that this tool will be beneficial for the researchers to advance the field of relation extraction.

الحساب واللغة

An Unsupervised Method for Building Sentence Simplification Corpora in Multiple Languages

350 - Xinyu Lu , Jipeng Qiang , Yun Li 2021

The availability of parallel sentence simplification (SS) is scarce for neural SS modelings. We propose an unsupervised method to build SS corpora from large-scale bilingual translation corpora, alleviating the need for SS supervised corpora. Our met hod is motivated by the following two findings: neural machine translation model usually tends to generate more high-frequency tokens and the difference of text complexity levels exists between the source and target language of a translation corpus. By taking the pair of the source sentences of translation corpus and the translations of their references in a bridge language, we can construct large-scale pseudo parallel SS data. Then, we keep these sentence pairs with a higher complexity difference as SS sentence pairs. The building SS corpora with an unsupervised approach can satisfy the expectations that the aligned sentences preserve the same meanings and have difference in text complexity levels. Experimental results show that SS methods trained by our corpora achieve the state-of-the-art results and significantly outperform the results on English benchmark WikiLarge.

الحساب واللغة استرجاع المعلومات

سجل دخول لتتمكن من نشر تعليقات

التعليقات

جاري جلب التعليقات

سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها

جامعة الحواش الخاصة

تفاصيل إضافية المزيد من الجامعات

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

The Parallel Meaning Bank: A Framework for Semantically Annotating Multiple Languages

اسأل ChatGPT حول البحث

ﻻ يوجد ملخص باللغة العربية

اقرأ أيضاً