في هذا العمل، نقدم طريقة لاختيار المحتوى وتخطيط المستندات للأخبار الآلية وتوليد التقارير من البيانات الإحصائية المهيكلة مثل تلك التي تقدمها الوكالة الإحصائية للاتحاد الأوروبي، يوروستات.هذه الطريقة مدفوعة بالبيانات وهي موضوع كبير مستقلة داخل مجال مجموعة البيانات الإحصائية.نظرا لأن نهجنا لا يعتمد على التعلم الآلي، فهو مناسب لإدخال أتمتة الأخبار إلى مجموعة واسعة من المجالات حيث لا توجد بيانات تدريبية متاحة.على هذا النحو، فإنه مناسب كتكلفة منخفضة (من حيث جهود التنفيذ) خط الأساس له هيكلة المستند قبل إدخال المعرفة الخاصة بالمجال.
In this work, we present a method for content selection and document planning for automated news and report generation from structured statistical data such as that offered by the European Union's statistical agency, EuroStat. The method is driven by the data and is highly topic-independent within the statistical dataset domain. As our approach is not based on machine learning, it is suitable for introducing news automation to the wide variety of domains where no training data is available. As such, it is suitable as a low-cost (in terms of implementation effort) baseline for document structuring prior to introduction of domain-specific knowledge.
References used
https://aclanthology.org/
In this work, we describe our efforts in improving the variety of language generated from a rule-based NLG system for automated journalism. We present two approaches: one based on inserting completely new words into sentences generated from templates
The mix-up method (Zhang et al., 2017), one of the methods for data augmentation, is known to be easy to implement and highly effective. Although the mix-up method is intended for image identification, it can also be applied to natural language proce
Text simplification is a valuable technique. However, current research is limited to sentence simplification. In this paper, we define and investigate a new task of document-level text simplification, which aims to simplify a document consisting of m
A crucial difference between single- and multi-document summarization is how salient content manifests itself in the document(s). While such content may appear at the beginning of a single document, essential information is frequently reiterated in a
EuroVoc is a multilingual thesaurus that was built for organizing the legislative documentary of the European Union institutions. It contains thousands of categories at different levels of specificity and its descriptors are targeted by legal texts i