أدت تقنيات الاحتجاج بالاستفادة من مجموعات البيانات الهائلة تقدم التطورات الأخيرة في تلخيص النص.في حين أن التفسيرات الشعبية تشير إلى أن تحويل المعرفة تحتفظ بمزايا الاحتجاط، فإن القليل معروف عن سبب عمله أو ما الذي يجعل مهمة محتملة أو مجموعة بيانات مناسبة.في هذه الورقة، نتحدى قصة نقل المعرفة، مما يدل على أن الاحيلية على المستندات التي تتألف من حرف N-gram المحدد عشوائيا، يمكننا أن نتطابق تقريبا من أداء النماذج المحددة على الفورورا الحقيقية.هذا العمل يحمل وعد بالقضاء على upstream corpora، والتي قد تخفف بعض المخاوف بشأن لغة مسيئة، التحيز، وقضايا حقوق الطبع والنشر.لمعرفة ما إذا كانت الفائدة الصغيرة المتبقية لاستخدام البيانات الحقيقية يمكن أن يتم حسابها من قبل هيكل مهمة محتملة، نقوم بتصميم العديد من المهام التي تحفزها دراسة نوعية لعلمة كورسا.ومع ذلك، فإن هذه المهام تمنح أي فائدة ملموسة، مما يترك فتح إمكانية دور صغير لنقل المعرفة.
Pretraining techniques leveraging enormous datasets have driven recent advances in text summarization. While folk explanations suggest that knowledge transfer accounts for pretraining's benefits, little is known about why it works or what makes a pretraining task or dataset suitable. In this paper, we challenge the knowledge transfer story, showing that pretraining on documents consisting of character n-grams selected at random, we can nearly match the performance of models pretrained on real corpora. This work holds the promise of eliminating upstream corpora, which may alleviate some concerns over offensive language, bias, and copyright issues. To see whether the small residual benefit of using real data could be accounted for by the structure of the pretraining task, we design several tasks motivated by a qualitative study of summarization corpora. However, these tasks confer no appreciable benefit, leaving open the possibility of a small role for knowledge transfer.
References used
https://aclanthology.org/
Task-agnostic pretraining objectives like masked language models or corrupted span prediction are applicable to a wide range of NLP downstream tasks (Raffel et al.,2019), but are outperformed by task-specific pretraining objectives like predicting ex
Pretrained transformer-based encoders such as BERT have been demonstrated to achieve state-of-the-art performance on numerous NLP tasks. Despite their success, BERT style encoders are large in size and have high latency during inference (especially o
Linguistic representations derived from text alone have been criticized for their lack of grounding, i.e., connecting words to their meanings in the physical world. Vision-and- Language (VL) models, trained jointly on text and image or video data, ha
The research community has proposed copious modifications to the Transformer architecture since it was introduced over three years ago, relatively few of which have seen widespread adoption. In this paper, we comprehensively evaluate many of these mo
This paper investigates whether the power of the models pre-trained on text data, such as BERT, can be transferred to general token sequence classification applications. To verify pre-trained models' transferability, we test the pre-trained models on