نحن نلاحظ قصر الإبلاغ الشديد من أنواع مختلفة من الأخطاء التي تصنعها أنظمة توليد اللغة الطبيعية.هذه مشكلة، لأن الأخطاء هي مؤشر مهم على حيث يجب تحسين الأنظمة.إذا أبلغ المؤلفون فقط إبلاغ مقاييس الأداء الإجمالية، فقد ترك مجتمع البحث في الظلام حول نقاط الضعف المحددة التي تعرضها أبحاث أحدث ".بجانب تحديد مدى اختلال الأخطاء، توفر ورقة الموضع هذه توصيات لتحديد الأخطاء والتحليل والإبلاغ.
We observe a severe under-reporting of the different kinds of errors that Natural Language Generation systems make. This is a problem, because mistakes are an important indicator of where systems should still be improved. If authors only report overall performance metrics, the research community is left in the dark about the specific weaknesses that are exhibited by state-of-the-art' research. Next to quantifying the extent of error under-reporting, this position paper provides recommendations for error identification, analysis and reporting.
References used
https://aclanthology.org/
A key part of the NLP ethics movement is responsible use of data, but exactly what that means or how it can be best achieved remain unclear. This position paper discusses the core legal and ethical principles for collection and sharing of textual dat
Common sense is an integral part of human cognition which allows us to make sound decisions, communicate effectively with others and interpret situations and utterances. Endowing AI systems with commonsense knowledge capabilities will help us get clo
Deep-learning models for language generation tasks tend to produce repetitive output. Various methods have been proposed to encourage lexical diversity during decoding, but this often comes at a cost to the perceived fluency and adequacy of the outpu
Many NLG tasks such as summarization, dialogue response, or open domain question answering, focus primarily on a source text in order to generate a target response. This standard approach falls short, however, when a user's intent or context of work
This paper describes an attempt to reproduce an earlier experiment, previously conducted by the author, that compares hedged and non-hedged NLG texts as part of the ReproGen shared challenge. This reproduction effort was only able to partially replic