نحن نتعامل مع مشكلة الملاحة حيث يتبع الوكيل تعليمات اللغة الطبيعية مع مراقبة البيئة.التركيز على فهم اللغة، نظهر أهمية دلالات المكانية في تعليمات الملاحة الأساسية في التصورات المرئية.نقترح وكيل عصبي يستخدم عناصر التكوينات المكانية والتحقيق في نفوذهم على قدرة مفطنة عامل الملاحة.علاوة على ذلك، نحن ننمذ نظام التنفيذ المتسلسل ومحاذاة الكائنات المرئية مع تكوينات مكانية في التعليمات.يحسن وكيلنا العصبي خطوط أساس قوية على البيئات المشاهدة ويظهر أداء تنافسي في البيئات غير المرئية.بالإضافة إلى ذلك، توضح النتائج التجريبية أن نمذجة صريحة للعناصر الدلالية المكانية في التعليمات يمكن أن تحسن من التفكير الأساسي والمكاني للنموذج.
We deal with the navigation problem where the agent follows natural language instructions while observing the environment. Focusing on language understanding, we show the importance of spatial semantics in grounding navigation instructions into visual perceptions. We propose a neural agent that uses the elements of spatial configurations and investigate their influence on the navigation agent's reasoning ability. Moreover, we model the sequential execution order and align visual objects with spatial configurations in the instruction. Our neural agent improves strong baselines on the seen environments and shows competitive performance on the unseen environments. Additionally, the experimental results demonstrate that explicit modeling of spatial semantic elements in the instructions can improve the grounding and spatial reasoning of the model.
References used
https://aclanthology.org/
The sheer volume of financial statements makes it difficult for humans to access and analyze a business's financials. Robust numerical reasoning likewise faces unique challenges in this domain. In this work, we focus on answering deep questions over
Defeasible reasoning is the mode of reasoning where conclusions can be overturned by taking into account new evidence. Existing cognitive science literature on defeasible reasoning suggests that a person forms a mental model'' of the problem scenario
In this paper, we investigate the Domain Generalization (DG) problem for supervised Paraphrase Identification (PI). We observe that the performance of existing PI models deteriorates dramatically when tested in an out-of-distribution (OOD) domain. We
The limits of applicability of vision-and language models are defined by the coverage of their training data. Tasks like vision question answering (VQA) often require commonsense and factual information beyond what can be learned from task-specific d
Temporal commonsense reasoning is a challenging task as it requires temporal knowledge usually not explicit in text. In this work, we propose an ensemble model for temporal commonsense reasoning. Our model relies on pre-trained contextual representat