تبسيط النص هو حقل متزايد مع العديد من التطبيقات المفيدة المحتملة.تتطلب خوارزميات تبسيط النص التدريب عموما الكثير من البيانات المشروحة، ومع ذلك لا توجد العديد من الشركات المناسبة لهذه المهمة.نقترح طريقة جديدة غير مخالفة لمحاذاة النص بناء على تضمين Doc2vec وخوارزمية محاذاة جديدة، قادرة على محاذاة النصوص على مستويات مختلفة.يوضح التقييم الأولي نتائج واعدة للنهج الجديد.استخدمنا النهج المطور الذي تم تطويره حديثا لإنشاء كوربلا متوازيا أحادية طيور أحادية جديدة تتألف من أعمال الفلاسفة الحديث الإنجليزي وإصداراتهم المبسطة المقابلة.
Text simplification is a growing field with many potential useful applications. Training text simplification algorithms generally requires a lot of annotated data, however there are not many corpora suitable for this task. We propose a new unsupervised method for aligning text based on Doc2Vec embeddings and a new alignment algorithm, capable of aligning texts at different levels. Initial evaluation shows promising results for the new approach. We used the newly developed approach to create a new monolingual parallel corpus composed of the works of English early modern philosophers and their corresponding simplified versions.
References used
https://aclanthology.org/
The National Virtual Translation Center (NVTC) seeks to acquire human language technology (HLT) tools that will facilitate its mission to provide verbatim English translations of foreign language audio and video files. In the text domain, NVTC has be
Code-Mixing (CM) is a common phenomenon in multilingual societies. CM plays a significant role in technology and medical fields where terminologies in the native language are not available or known. Language Identification (LID) of the CM data will h
Large language models benefit from training with a large amount of unlabeled text, which gives them increasingly fluent and diverse generation capabilities. However, using these models for text generation that takes into account target attributes, su
This paper investigates how to correct Chinese text errors with types of mistaken, missing and redundant characters, which are common for Chinese native speakers. Most existing models based on detect-correct framework can correct mistaken characters,
The task of document-level text simplification is very similar to summarization with the additional difficulty of reducing complexity. We introduce a newly collected data set of German texts, collected from the Swiss news magazine 20 Minuten (20 Minu