على الرغم من كفاءتها المثبتة في المجالات الأخرى، فإن تكبير البيانات أقل شعبية في سياق معالجة اللغة الطبيعية (NLP) بسبب تعقيدها ونتائج محدودة.أظهرت دراسة حديثة (Longpre et al.، 2020) على سبيل المثال أن تعزز بيانات المهمة غير المرغوية تفشل في تعزيز أداء المحولات مسبقا حتى في أنظمة البيانات المنخفضة.في هذه الورقة، نحقق في ما إذا كان جدولة التكبير التي يحركها البيانات وإدماج مجموعة أوسع من التحولات يمكن أن تؤدي إلى تحسين الأداء حيث كانت السياسات الثابتة والمحدودة غير ناجحة.تشير نتائجنا إلى أنه، في حين أن هذا النهج يمكن أن يساعد عملية التدريب في بعض الإعدادات، فإن التحسينات غير صحيحة.هذه النتيجة السلبية تهدف إلى مساعدة الباحثين فهم أفضل قيود تكبير البيانات من أجل NLP.
Despite its proven efficiency in other fields, data augmentation is less popular in the context of natural language processing (NLP) due to its complexity and limited results. A recent study (Longpre et al., 2020) showed for example that task-agnostic data augmentations fail to consistently boost the performance of pretrained transformers even in low data regimes. In this paper, we investigate whether data-driven augmentation scheduling and the integration of a wider set of transformations can lead to improved performance where fixed and limited policies were unsuccessful. Our results suggest that, while this approach can help the training process in some settings, the improvements are unsubstantial. This negative result is meant to help researchers better understand the limitations of data augmentation for NLP.
References used
https://aclanthology.org/
Due to its great power in modeling non-Euclidean data like graphs or manifolds, deep learning on graph techniques (i.e., Graph Neural Networks (GNNs)) have opened a new door to solving challenging graph-related NLP problems. There has seen a surge of
It is generally agreed upon in the natural language processing (NLP) community that ethics should be integrated into any curriculum. Being aware of and understanding the relevant core concepts is a prerequisite for following and participating in the
We present an open-source toolkit for Danish Natural Language Processing, enabling easy access to Danish NLP's latest advancements. The toolkit features wrapper-functions for loading models and datasets in a unified way using third-party NLP framewor
The field of Natural Language Processing (NLP) changes rapidly, requiring course offerings to adjust with those changes, and NLP is not just for computer scientists; it's a field that should be accessible to anyone who has a sufficient background. In
Sign language translation (SLT) is often decomposed into video-to-gloss recognition and gloss to-text translation, where a gloss is a sequence of transcribed spoken-language words in the order in which they are signed. We focus here on gloss-to-text