تقدم هذه الورقة طريقة تلخيص عالمية لتعليقات الرياضة الحية التي لدينا ملخص مكتوب بشري متاح.تستند هذه الطريقة إلى ملخص مولد عصبي.يتم تقييد كمية البيانات المتاحة للتدريب مقارنة بالشريعة المستخدمة عادة من قبل الملخصات العصبية.نقترح لمساعدة الملخص على التعلم من كمية محدودة من البيانات عن طريق الحد من انتروبيا من نصوص الإدخال.يتم تنفيذ هذه الخطوة من خلال تصنيف إلى فئات مستمدة من تحليل مفصل للملخصات التي كتبها الإنسان.نظهر أن الترشيح يساعد نظام التلخيص للتغلب على نقص الموارد.ومع ذلك، ظهرت عدة نقاط تحسين من هذه الدراسة الأولية، والتي نناقشها وتخطط لتنفيذها في العمل في المستقبل.
This paper presents a global summarization method for live sport commentaries for which we have a human-written summary available. This method is based on a neural generative summarizer. The amount of data available for training is limited compared to corpora commonly used by neural summarizers. We propose to help the summarizer to learn from a limited amount of data by limiting the entropy of the input texts. This step is performed by a classification into categories derived by a detailed analysis of the human-written summaries. We show that the filtering helps the summarization system to overcome the lack of resources. However, several improving points have emerged from this preliminary study, that we discuss and plan to implement in future work.
References used
https://aclanthology.org/
Previous work indicates that discourse information benefits summarization. In this paper, we explore whether this synergy between discourse and summarization is bidirectional, by inferring document-level discourse trees from pre-trained neural summar
Reference-based automatic evaluation metrics are notoriously limited for NLG due to their inability to fully capture the range of possible outputs. We examine a referenceless alternative: evaluating the adequacy of English sentences generated from Ab
Modern approaches to Constituency Parsing are mono-lingual supervised approaches which require large amount of labelled data to be trained on, thus limiting their utility to only a handful of high-resource languages. To address this issue of data-spa
In simultaneous machine translation, finding an agent with the optimal action sequence of reads and writes that maintain a high level of translation quality while minimizing the average lag in producing target tokens remains an extremely challenging
Jupyter notebook allows data scientists to write machine learning code together with its documentation in cells. In this paper, we propose a new task of code documentation generation (CDG) for computational notebooks. In contrast to the previous CDG