تؤدي نماذج اللغة المدربة مسبقا بشكل جيد في مجموعة متنوعة من المهام اللغوية التي تتطلب منطق رمزي، مما رفع مسألة ما إذا كانت هذه النماذج تمثل ضمنيا الرموز والقواعد المجردة. نحن نحقق في هذا السؤال باستخدام دراسة حالة أداء بيرت على اتفاقية الفعل الإنجليزي - الفعل. على عكس العمل السابق، ندرب حالات متعددة من بيرت من نقطة الصفر، مما يسمح لنا بإجراء سلسلة من التدخلات التي تسيطر عليها وقت ما قبل التدريب. نظرا لأن بيرت تعميم غالبا جيدا حتى تخضع أزواج الفعل التي لم تحدث أبدا في التدريب، مما يشير إلى درجة من السلوك الذي تحكم القواعد. ومع ذلك، نجد أيضا أن هذا الأداء يتأثر بشدة بتردد الكلمات، مع وجود تجارب تظهر أن كل من التردد المطلق لنموذج الفعل، وكذلك التردد بالنسبة إلى الانعطاف البديل، يتم تورطه سببابيا في تنبؤات Bert في وقت الاستدلال وبعد يكشف التحليل الأقرب من تأثيرات التردد هذه أن سلوك بيرت يتوافق مع النظام الذي يطبق بشكل صحيح قاعدة SVA بشكل عام ولكنه يكافح من أجل التغلب على بظر تدريب قوي وتقدير ميزات الاتفاقية (المفرد مقابل الجمع) على البنود المعجمية النادرة.
Pre-trained language models perform well on a variety of linguistic tasks that require symbolic reasoning, raising the question of whether such models implicitly represent abstract symbols and rules. We investigate this question using the case study of BERT's performance on English subject--verb agreement. Unlike prior work, we train multiple instances of BERT from scratch, allowing us to perform a series of controlled interventions at pre-training time. We show that BERT often generalizes well to subject--verb pairs that never occurred in training, suggesting a degree of rule-governed behavior. We also find, however, that performance is heavily influenced by word frequency, with experiments showing that both the absolute frequency of a verb form, as well as the frequency relative to the alternate inflection, are causally implicated in the predictions BERT makes at inference time. Closer analysis of these frequency effects reveals that BERT's behavior is consistent with a system that correctly applies the SVA rule in general but struggles to overcome strong training priors and to estimate agreement features (singular vs. plural) on infrequent lexical items.
References used
https://aclanthology.org/
Grammatical rules are deduced from Arabic spoken by ideally intuitive Arabic
speakers, and illustration is the spirit of the rule, endowing it with life, pleasure, and
originality. The Arabic used in illustration is that of the holy Quran, sayings
This paper tries to examine the relationship between analogy and the grammatical
rule. Analogy is one of the basic principles and bases of Arabic grammar during times of
rule formation and judging it. Linguists were divided in their attitude to ana
Abstract This study carries out a systematic intrinsic evaluation of the semantic representations learned by state-of-the-art pre-trained multimodal Transformers. These representations are claimed to be task-agnostic and shown to help on many downstr
Transformer models are expensive to fine-tune, slow for inference, and have large storage requirements. Recent approaches tackle these shortcomings by training smaller models, dynamically reducing the model size, and by training light-weight adapters
Recent progress in natural language processing has led to Transformer architectures becoming the predominant model used for natural language tasks. However, in many real- world datasets, additional modalities are included which the Transformer does n