إن جودة أنظمة تبسيط النص الآلي بالكامل ليست جيدة بما يكفي للاستخدام في إعدادات العالم الحقيقي؛بدلا من ذلك، يتم استخدام التبسيط البشري.في هذه الورقة، ندرس كيفية تحسين تكلفة وجودة التبسيط البشري من خلال الاستفادة من الجماعة الجماعية.نقدم نهج الانصهار الجملة في الرسم البياني لزيادة التبسيط البشري ونهج إعادة النشر لكل من تحديد المبسط عالية الجودة والسماح باستهداف التبسيط بمستويات متفاوتة من البساطة.باستخدام DataSet Newsela (XU et al.، 2015) نظهر تحسينات متسقة على الخبراء في مستويات تبسيط مختلفة وتجد أن تبسيط الانصهار الجملة الإضافية تسمح بإخراج أبسط من التبسيط البشري وحدها.
The quality of fully automated text simplification systems is not good enough for use in real-world settings; instead, human simplifications are used. In this paper, we examine how to improve the cost and quality of human simplifications by leveraging crowdsourcing. We introduce a graph-based sentence fusion approach to augment human simplifications and a reranking approach to both select high quality simplifications and to allow for targeting simplifications with varying levels of simplicity. Using the Newsela dataset (Xu et al., 2015) we show consistent improvements over experts at varying simplification levels and find that the additional sentence fusion simplifications allow for simpler output than the human simplifications alone.
References used
https://aclanthology.org/
Recently, a large pre-trained language model called T5 (A Unified Text-to-Text Transfer Transformer) has achieved state-of-the-art performance in many NLP tasks. However, no study has been found using this pre-trained model on Text Simplification. Th
Sentence fusion is a conditional generation task that merges several related sentences into a coherent one, which can be deemed as a summary sentence. The importance of sentence fusion has long been recognized by communities in natural language gener
This paper describes SimpleNER, a model developed for the sentence simplification task at GEM-2021. Our system is a monolingual Seq2Seq Transformer architecture that uses control tokens pre-pended to the data, allowing the model to shape the generate
Text Simplification improves the readability of sentences through several rewriting transformations, such as lexical paraphrasing, deletion, and splitting. Current simplification systems are predominantly sequence-to-sequence models that are trained
To build automated simplification systems, corpora of complex sentences and their simplified versions is the first step to understand sentence complexity and enable the development of automatic text simplification systems. We present a lexical and sy