New community

Subscribe to the gold package and get unlimited access to Shamra Academy

MG-BERT: Multi-Graph Augmented BERT for Masked Language Modeling

MG-BERT: برت مزيد من الرسم البياني متعدد الرسوم البيانية لمصمم لغة ملثم

295 0 0 0.0 ( 0 )

Download Cite

Added by Association for Computation Linguistics مقالة

Publication date 2021

fields Artificial Intelligence

and research's language is English

Created by Shamra Editor

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Pre-trained models like Bidirectional Encoder Representations from Transformers (BERT), have recently made a big leap forward in Natural Language Processing (NLP) tasks. However, there are still some shortcomings in the Masked Language Modeling (MLM) task performed by these models. In this paper, we first introduce a multi-graph including different types of relations between words. Then, we propose Multi-Graph augmented BERT (MG-BERT) model that is based on BERT. MG-BERT embeds tokens while taking advantage of a static multi-graph containing global word co-occurrences in the text corpus beside global real-world facts about words in knowledge graphs. The proposed model also employs a dynamic sentence graph to capture local context effectively. Experimental results demonstrate that our model can considerably enhance the performance in the MLM task.

References used

https://aclanthology.org/

rate research

Modeling Graph Structure via Relative Position for Text Generation from Knowledge Graphs

334 - Association for Computation Linguistics 2021 مقالة

We present Graformer, a novel Transformer-based encoder-decoder architecture for graph-to-text generation. With our novel graph self-attention, the encoding of a node relies on all nodes in the input graph - not only direct neighbors - facilitating t he detection of global patterns. We represent the relation between two nodes as the length of the shortest path between them. Graformer learns to weight these node-node relations differently for different attention heads, thus virtually learning differently connected views of the input graph. We evaluate Graformer on two popular graph-to-text generation benchmarks, AGENDA and WebNLG, where it achieves strong performance while using many fewer parameters than other approaches.

modeling graph structure structure via relative relative position هيكل الرسم البياني النمذجة هيكل عبر قريب الوضع النسبي صناعة حمض الفوسفور المزيد..

Masked Language Modeling and the Distributional Hypothesis: Order Word Matters Pre-training for Little

279 - Association for Computation Linguistics 2021 مقالة

A possible explanation for the impressive performance of masked language model (MLM) pre-training is that such models have learned to represent the syntactic structures prevalent in classical NLP pipelines. In this paper, we propose a different expla nation: MLMs succeed on downstream tasks almost entirely due to their ability to model higher-order word co-occurrence statistics. To demonstrate this, we pre-train MLMs on sentences with randomly shuffled word order, and show that these models still achieve high accuracy after fine-tuning on many downstream tasks---including tasks specifically designed to be challenging for models that ignore word order. Our models perform surprisingly well according to some parametric syntactic probes, indicating possible deficiencies in how we test representations for syntactic information. Overall, our results show that purely distributional information largely explains the success of pre-training, and underscore the importance of curating challenging evaluation datasets that require deeper linguistic knowledge.

معلومات بايزي المتبادلة word matters pre-training order word matters كلمة الأمور قبل التدريب طلب كلمة الأمور صناعة حمض الفوسفور

Neural Language Modeling for Contextualized Temporal Graph Generation

319 - Association for Computation Linguistics 2021 مقالة

This paper presents the first study on using large-scale pre-trained language models for automated generation of an event-level temporal graph for a document. Despite the huge success of neural pre-training methods in NLP tasks, its potential for tem poral reasoning over event graphs has not been sufficiently explored. Part of the reason is the difficulty in obtaining large training corpora with human-annotated events and temporal links. We address this challenge by using existing IE/NLP tools to automatically generate a large quantity (89,000) of system-produced document-graph pairs, and propose a novel formulation of the contextualized graph generation problem as a sequence-to-sequence mapping task. These strategies enable us to leverage and fine-tune pre-trained language models on the system-induced training data for the graph generation task. Our experiments show that our approach is highly effective in generating structurally and semantically valid graphs. Further, evaluation on a challenging hand-labeled, out-of-domain corpus shows that our method outperforms the closest existing method by a large margin on several metrics. We also show a downstream application of our approach by adapting it to answer open-ended temporal questions in a reading comprehension setting.

neural language modeling نمذجة اللغة العصبية صناعة حمض الفوسفور

Ad Headline Generation using Self-Critical Masked Language Model

569 - Association for Computation Linguistics 2021 مقالة

For any E-commerce website it is a nontrivial problem to build enduring advertisements that attract shoppers. It is hard to pass the creative quality bar of the website, especially at a large scale. We thus propose a programmatic solution to generate product advertising headlines using retail content. We propose a state of the art application of Reinforcement Learning (RL) Policy gradient methods on Transformer (Vaswani et al., 2017) based Masked Language Models (Devlin et al., 2019). Our method creates the advertising headline by jointly conditioning on multiple products that a seller wishes to advertise. We demonstrate that our method outperforms existing Transformer and LSTM + RL methods in overlap metrics and quality audits. We also show that our model generated headlines outperform human submitted headlines in terms of both grammar and creative quality as determined by audits.

self-critical masked language generation using self-critical masked language لغة ملثم ذاتية جيل باستخدام الحرجة الذاتية لغة ملثمنة صناعة حمض الفوسفور المزيد..

Data Efficient Masked Language Modeling for Vision and Language

629 - Association for Computation Linguistics 2021 مقالة

Masked language modeling (MLM) is one of the key sub-tasks in vision-language pretraining. In the cross-modal setting, tokens in the sentence are masked at random, and the model predicts the masked tokens given the image and the text. In this paper, we observe several key disadvantages of MLM in this setting. First, as captions tend to be short, in a third of the sentences no token is sampled. Second, the majority of masked tokens are stop-words and punctuation, leading to under-utilization of the image. We investigate a range of alternative masking strategies specific to the cross-modal setting that address these shortcomings, aiming for better fusion of text and image in the learned representation. When pre-training the LXMERT model, our alternative masking strategies consistently improve over the original masking strategy on three downstream tasks, especially in low resource settings. Further, our pre-training approach substantially outperforms the baseline model on a prompt-based probing task designed to elicit image objects. These results and our analysis indicate that our method allows for better utilization of the training data.

efficient masked language لغة ملثمفة فعالة صناعة حمض الفوسفور

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

MG-BERT: Multi-Graph Augmented BERT for Masked Language Modeling

MG-BERT: برت مزيد من الرسم البياني متعدد الرسوم البيانية لمصمم لغة ملثم

Ask ChatGPT about the research

Read More

suggested questions