نحن التحقيق في نماذج لغة المحولات المدربة مسبقا لسد الاستدلال.نقوم أولا بالتحقيق في رؤوس الاهتمام الفردي في بيرت ومراقبة أن رؤساء الاهتمام في طبقات أعلى تركز بشكل بارز على سد العلاقات داخل المقارنة مع الطبقات المنخفضة والمتوسطة، وكذلك عدد قليل من رؤساء اهتمامات محددة يركزون باستمرار على سد.الأهم من ذلك، نحن نفكر في نماذج اللغة ككل في نهجنا الثاني حيث يتم صياغة دقة سد العسرة كمهمة تتنبئة رمزية مثيرة للمثنين (من اختبار Cloze).تنتج صياغتنا نتائج متفائلة دون أي ضبط جيد، مما يشير إلى أن نماذج اللغة المدربة مسبقا تلتقط بشكل كبير في سد الاستدلال.يوضح تحقيقنا الإضافي أن المسافة بين المداعين - السابقة وسوء السياق المقدمة إلى النماذج اللغوية تلعب دورا مهما في الاستدلال.
We probe pre-trained transformer language models for bridging inference. We first investigate individual attention heads in BERT and observe that attention heads at higher layers prominently focus on bridging relations in-comparison with the lower and middle layers, also, few specific attention heads concentrate consistently on bridging. More importantly, we consider language models as a whole in our second approach where bridging anaphora resolution is formulated as a masked token prediction task (Of-Cloze test). Our formulation produces optimistic results without any fine-tuning, which indicates that pre-trained language models substantially capture bridging inference. Our further investigation shows that the distance between anaphor-antecedent and the context provided to language models play an important role in the inference.
References used
https://aclanthology.org/
Pre-trained multilingual language models have become an important building block in multilingual Natural Language Processing. In the present paper, we investigate a range of such models to find out how well they transfer discourse-level knowledge acr
This work demonstrates the development process of a machine learning architecture for inference that can scale to a large volume of requests. We used a BERT model that was fine-tuned for emotion analysis, returning a probability distribution of emoti
Existing work on probing of pretrained language models (LMs) has predominantly focused on sentence-level syntactic tasks. In this paper, we introduce document-level discourse probing to evaluate the ability of pretrained LMs to capture document-level
The success of language models based on the Transformer architecture appears to be inconsistent with observed anisotropic properties of representations learned by such models. We resolve this by showing, contrary to previous studies, that the represe
The success of large-scale contextual language models has attracted great interest in probing what is encoded in their representations. In this work, we consider a new question: to what extent contextual representations of concrete nouns are aligned