New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Enriching the E2E dataset

إثراء DataSet E2E

595 0 0 0.0 ( 0 )

Download Cite

Added by Association for Computation Linguistics مقالة

Publication date 2021

fields Artificial Intelligence

and research's language is English

Created by Shamra Editor

dataset enriching nlg DataSet. إثراء NLG. صناعة حمض الفوسفور

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

This study introduces an enriched version of the E2E dataset, one of the most popular language resources for data-to-text NLG. We extract intermediate representations for popular pipeline tasks such as discourse ordering, text structuring, lexicalization and referring expression generation, enabling researchers to rapidly develop and evaluate their data-to-text pipeline systems. The intermediate representations are extracted by aligning non-linguistic and text representations through a process called delexicalization, which consists in replacing input referring expressions to entities/attributes with placeholders. The enriched dataset is publicly available.

References used

https://aclanthology.org/

rate research

Enriching plWordNet with morphology

203 - Association for Computation Linguistics 2021 مقالة

In the paper, we present the process of adding morphological information to the Polish WordNet (plWordNet). We describe the reasons for this connection and the intuitions behind it. We also draw attention to the specificity of the Polish morphology. We show in which tasks the morphological information is important and how the methods can be developed by extending them to include combined morphological information based on WordNet.

enriching plwordnet morphological information polish morphology إثراء plwordnet. المعلومات المورفولوجية المورفولوجيا البولندية صناعة حمض الفوسفور المزيد..

Edge: Enriching Knowledge Graph Embeddings with External Text

201 - Association for Computation Linguistics 2021 مقالة

Knowledge graphs suffer from sparsity which degrades the quality of representations generated by various methods. While there is an abundance of textual information throughout the web and many existing knowledge bases, aligning information across the se diverse data sources remains a challenge in the literature. Previous work has partially addressed this issue by enriching knowledge graph entities based on hard'' co-occurrence of words present in the entities of the knowledge graphs and external text, while we achieve soft'' augmentation by proposing a knowledge graph enrichment and embedding framework named Edge. Given an original knowledge graph, we first generate a rich but noisy augmented graph using external texts in semantic and structural level. To distill the relevant knowledge and suppress the introduced noise, we design a graph alignment term in a shared embedding space between the original graph and augmented graph. To enhance the embedding learning on the augmented graph, we further regularize the locality relationship of target entity based on negative sampling. Experimental results on four benchmark datasets demonstrate the robustness and effectiveness of Edge in link prediction and node classification.

enriching knowledge graph إثراء الرسم البياني المعرفة صناعة حمض الفوسفور

Enriching the Transformer with Linguistic Factors for Low-Resource Machine Translation

222 - Association for Computation Linguistics 2021 مقالة

Introducing factors, that is to say, word features such as linguistic information referring to the source tokens, is known to improve the results of neural machine translation systems in certain settings, typically in recurrent architectures. This st udy proposes enhancing the current state-of-the-art neural machine translation architecture, the Transformer, so that it allows to introduce external knowledge. In particular, our proposed modification, the Factored Transformer, uses linguistic factors that insert additional knowledge into the machine translation system. Apart from using different kinds of features, we study the effect of different architectural configurations. Specifically, we analyze the performance of combining words and features at the embedding level or at the encoder level, and we experiment with two different combination strategies. With the best-found configuration, we show improvements of 0.8 BLEU over the baseline Transformer in the IWSLT German-to-English task. Moreover, we experiment with the more challenging FLoRes English-to-Nepali benchmark, which includes both extremely low-resourced and very distant languages, and obtain an improvement of 1.2 BLEU

low-resource machine translation ترجمة آلة منخفضة الموارد صناعة حمض الفوسفور

The Swedish Winogender Dataset

408 - Association for Computation Linguistics 2021 مقالة

We introduce the SweWinogender test set, a diagnostic dataset to measure gender bias in coreference resolution. It is modelled after the English Winogender benchmark, and is released with reference statistics on the distribution of men and women betw een occupations and the association between gender and occupation in modern corpus material. The paper discusses the design and creation of the dataset, and presents a small investigation of the supplementary statistics.

swedish winogender dataset swedish winogender english winogender benchmark سويدية وينوجندر DataSet. السويدية ينوجندر الإنجليزية ينوجندر المعايير صناعة حمض الفوسفور المزيد..

Exploring the Integration of E2E ASR and Pronunciation Modeling for English Mispronunciation Detection

432 - Association for Computation Linguistics 2021 مقالة

There has been increasing demand to develop effective computer-assisted language training (CAPT) systems, which can provide feedback on mispronunciations and facilitate second-language (L2) learners to improve their speaking proficiency through repea ted practice. Due to the shortage of non-native speech for training the automatic speech recognition (ASR) module of a CAPT system, the corresponding mispronunciation detection performance is often affected by imperfect ASR. Recognizing this importance, we in this paper put forward a two-stage mispronunciation detection method. In the first stage, the speech uttered by an L2 learner is processed by an end-to-end ASR module to produce N-best phone sequence hypotheses. In the second stage, these hypotheses are fed into a pronunciation model which seeks to faithfully predict the phone sequence hypothesis that is most likely pronounced by the learner, so as to improve the performance of mispronunciation detection. Empirical experiments conducted a English benchmark dataset seem to confirm the utility of our method.

exploring the integration mispronunciation detection pronunciation modeling استكشاف التكامل اكتشاف أخطاء أخطاء النمذجة النطق صناعة حمض الفوسفور المزيد..

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Enriching the E2E dataset

إثراء DataSet E2E

Ask ChatGPT about the research

Read More

suggested questions