تلخيص النص التلقائي (ATS) هو مهمة توليد ملخصات موجزة وطلاقة من مستند واحد أو أكثر.في هذه الورقة، نقدم ICESUM، أول كوربوس الأيسلندية المشروح مع ملخصات تولدها الإنسان.يتكون ICESUM من 1000 مقالة إخبارية عبر الإنترنت ملخصاتها الاستخراجية.نحن ندرب وتقييم العديد من النماذج القائمة على الشبكة العصبية في هذه البيانات، ومقارنتها ضد مجموعة مختارة من الأساليب الأساسية.نجد أن نموذج فك ترميز التشفير مع النازع المستند إلى التسلسل يحصل على أفضل النتائج، مما يتفوق على جميع أساليب خط الأساس.علاوة على ذلك، نقيم كيف يؤثر حجم كوربوس التدريب على جودة الملخصات التي تم إنشاؤها.نفرج عن Corpus والنماذج مع ترخيص مفتوح.
Automatic Text Summarization (ATS) is the task of generating concise and fluent summaries from one or more documents. In this paper, we present IceSum, the first Icelandic corpus annotated with human-generated summaries. IceSum consists of 1,000 online news articles and their extractive summaries. We train and evaluate several neural network-based models on this dataset, comparing them against a selection of baseline methods. We find that an encoder-decoder model with a sequence-to-sequence based extractor obtains the best results, outperforming all baseline methods. Furthermore, we evaluate how the size of the training corpus affects the quality of the generated summaries. We release the corpus and the models with an open license.
References used
https://aclanthology.org/
In machine translation, corpus preparation is one of the crucial tasks, particularly for lowresource pairs. In multilingual countries like India, machine translation plays a vital role in communication among people with various linguistic backgrounds
Social media (SM) platforms such as Twitter provide large quantities of real-time data that can be leveraged during mass emergencies. Developing tools to support crisis-affected communities requires available datasets, which often do not exist for lo
Extractive text summarization aims at extracting the most representative sentences from a given document as its summary. To extract a good summary from a long text document, sentence embedding plays an important role. Recent studies have leveraged gr
We create a large-scale dialogue corpus that provides pragmatic paraphrases to advance technology for understanding the underlying intentions of users. While neural conversation models acquire the ability to generate fluent responses through training
This is a research proposal for doctoral research into sarcasm detection, and the real-time compilation of an English language corpus of sarcastic utterances. It details the previous research into similar topics, the potential research directions and the research aims.