إن فهم وتعليمات اللغة الطبيعية في مجال أساسي هي واحدة من السمات المميزة للذكاء الاصطناعي. في هذه الورقة، نركز على فهم التعليمات في المجال العالمي كتل والتحقيق في قدرات فهم قدرات نظامين أفضل أداء للمهمة. نحن نهدف إلى فهم ما إذا كان أداء اختبار هذه النماذج يشير إلى فهم المجال المكاني وتعليمات اللغة الطبيعية بالنسبة إليها، أو ما إذا كانت مجرد إشارات متفوقة في DataSet. نقوم بصياغة مجموعة من التوقعات قد يكون لدى المرء من التعليمات التالية النموذج وتمييز الأبعاد المختلفة المختلفة التي يجب أن تمتلكها مثل هذا النموذج. على الرغم من أداء الاختبار اللائق، نجد أن النماذج الحديثة تنخفض هذه التوقعات وهشة للغاية. بعد ذلك اقترحنا استراتيجية تعليمية تتضمن تكبير البيانات وإظهارها من خلال تجارب واسعة النطاق التي توليها استراتيجية التعلم المقترحة نماذج تنافسية في مجموعة الاختبار الأصلية مع إرضاء توقعاتنا بشكل أفضل.
Understanding and executing natural language instructions in a grounded domain is one of the hallmarks of artificial intelligence. In this paper, we focus on instruction understanding in the blocks world domain and investigate the language understanding abilities of two top-performing systems for the task. We aim to understand if the test performance of these models indicates an understanding of the spatial domain and of the natural language instructions relative to it, or whether they merely over-fit spurious signals in the dataset. We formulate a set of expectations one might have from an instruction following model and concretely characterize the different dimensions of robustness such a model should possess. Despite decent test performance, we find that state-of-the-art models fall short of these expectations and are extremely brittle. We then propose a learning strategy that involves data augmentation and show through extensive experiments that the proposed learning strategy yields models that are competitive on the original test set while satisfying our expectations much better.
References used
https://aclanthology.org/
Executing natural language instructions in a physically grounded domain requires a model that understands both spatial concepts such as left of'' and above'', and the compositional language used to identify landmarks and articulate instructions relat
Standard architectures used in instruction following often struggle on novel compositions of subgoals (e.g. navigating to landmarks or picking up objects) observed during training. We propose a modular architecture for following natural language inst
We analyze language change over time in a collaborative, goal-oriented instructional task, where utility-maximizing participants form conventions and increase their expertise. Prior work studied such scenarios mostly in the context of reference games
Although neural sequence-to-sequence models have been successfully applied to semantic parsing, they fail at compositional generalization, i.e., they are unable to systematically generalize to unseen compositions of seen components. Motivated by trad
Much of recent progress in NLU was shown to be due to models' learning dataset-specific heuristics. We conduct a case study of generalization in NLI (from MNLI to the adversarially constructed HANS dataset) in a range of BERT-based architectures (ada