تظهر نماذج التعلم العميق تفضيلات للتركيب الإحصائي بشأن التفكير المنطقي.قد يتم حفظ الارتباطات الزائفة عند وجود تحيز إحصائي في بيانات التدريب، مما يحد بشدة من أداء النموذج بشكل خاص في سيناريوهات البيانات الصغيرة.في هذا العمل، نقدم إطار تدريب عدائي مضاد للأرض (القط) لمعالجة المشكلة من منظور السببية.خاصة، بالنسبة لعينة محددة، تنشئ القط أولا تمثيل مضاد من خلال الاستيفاء الفضائي الكامن بطريقة مخفية، ثم يؤدي ذلك إلى تقليل المخاطر المضادة (CRM) على كل زوج مضاد للأصلية لضبط وزن الخسارة العينة بشكل حيوي، مما يشجع النموذجلاستكشاف التأثير السببي الحقيقي.توضح تجارب واسعة أن القط يحقق تحسين أداء كبير على سوتا عبر المهام المختلفة المصب، بما في ذلك تصنيف الجملة، والاستدلال باللغة الطبيعية والرد على السؤال.
Deep learning models exhibit a preference for statistical fitting over logical reasoning. Spurious correlations might be memorized when there exists statistical bias in training data, which severely limits the model performance especially in small data scenarios. In this work, we introduce Counterfactual Adversarial Training framework (CAT) to tackle the problem from a causality perspective. Particularly, for a specific sample, CAT first generates a counterfactual representation through latent space interpolation in an adversarial manner, and then performs Counterfactual Risk Minimization (CRM) on each original-counterfactual pair to adjust sample-wise loss weight dynamically, which encourages the model to explore the true causal effect. Extensive experiments demonstrate that CAT achieves substantial performance improvement over SOTA across different downstream tasks, including sentence classification, natural language inference and question answering.
References used
https://aclanthology.org/
Taxonomies are valuable resources for many applications, but the limited coverage due to the expensive manual curation process hinders their general applicability. Prior works attempt to automatically expand existing taxonomies to improve their cover
Generating informative and appropriate responses is challenging but important for building human-like dialogue systems. Although various knowledge-grounded conversation models have been proposed, these models have limitations in utilizing knowledge t
Large-scale auto-regressive models have achieved great success in dialogue response generation, with the help of Transformer layers. However, these models do not learn a representative latent space of the sentence distribution, making it hard to cont
Exemplar-Guided Paraphrase Generation (EGPG) aims to generate a target sentence which conforms to the style of the given exemplar while encapsulating the content information of the source sentence. In this paper, we propose a new method with the goal
Recent progress in pretrained Transformer-based language models has shown great success in learning contextual representation of text. However, due to the quadratic self-attention complexity, most of the pretrained Transformers models can only handle