Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

IceSum: An Icelandic Text Summarization Corpus

acesum: تخصيص نص أيسلندي كوربوس

515 0 0 0.0 ( 0 )

Download Cite

Added by Association for Computation Linguistics مقالة

Publication date 2021

fields Artificial Intelligence

and research's language is English

Created by Shamra Editor

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Automatic Text Summarization (ATS) is the task of generating concise and fluent summaries from one or more documents. In this paper, we present IceSum, the first Icelandic corpus annotated with human-generated summaries. IceSum consists of 1,000 online news articles and their extractive summaries. We train and evaluate several neural network-based models on this dataset, comparing them against a selection of baseline methods. We find that an encoder-decoder model with a sequence-to-sequence based extractor obtains the best results, outperforming all baseline methods. Furthermore, we evaluate how the size of the training corpus affects the quality of the generated summaries. We release the corpus and the models with an open license.

References used

https://aclanthology.org/

rate research

EnKhCorp1.0: An English--Khasi Corpus

555 - Association for Computation Linguistics 2021 مقالة

In machine translation, corpus preparation is one of the crucial tasks, particularly for lowresource pairs. In multilingual countries like India, machine translation plays a vital role in communication among people with various linguistic backgrounds . There are available online automatic translation systems by Google and Microsoft which include various languages which lack support for the Khasi language, which can hence be considered lowresource. This paper overviews the development of EnKhCorp1.0, a corpus for English--Khasi pair, and implemented baseline systems for EnglishtoKhasi and KhasitoEnglish translation based on the neural machine translation approach.

ترجمة آلة Manipuri-English صناعة حمض الفوسفور

Kawarith: an Arabic Twitter Corpus for Crisis Events

830 - Association for Computation Linguistics 2021 مقالة

Social media (SM) platforms such as Twitter provide large quantities of real-time data that can be leveraged during mass emergencies. Developing tools to support crisis-affected communities requires available datasets, which often do not exist for lo w resource languages. This paper introduces Kawarith a multi-dialect Arabic Twitter corpus for crisis events, comprising more than a million Arabic tweets collected during 22 crises that occurred between 2018 and 2020 and involved several types of hazard. Exploration of this content revealed the most discussed topics and information types, and the paper presents a labelled dataset from seven emergency events that serves as a gold standard for several tasks in crisis informatics research. Using annotated data from the same event, a BERT model is fine-tuned to classify tweets into different categories in the multi- label setting. Results show that BERT-based models yield good performance on this task even with small amounts of task-specific training data.

arabic twitter corpus arabic twitter العربية تويتر كوربوس تويتر عربي صناعة حمض الفوسفور

Multiplex Graph Neural Network for Extractive Text Summarization

1097 - Association for Computation Linguistics 2021 مقالة

Extractive text summarization aims at extracting the most representative sentences from a given document as its summary. To extract a good summary from a long text document, sentence embedding plays an important role. Recent studies have leveraged gr aph neural networks to capture the inter-sentential relationship (e.g., the discourse graph) within the documents to learn contextual sentence embedding. However, those approaches neither consider multiple types of inter-sentential relationships (e.g., semantic similarity and natural connection relationships), nor model intra-sentential relationships (e.g, semantic similarity and syntactic relationship among words). To address these problems, we propose a novel Multiplex Graph Convolutional Network (Multi-GCN) to jointly model different types of relationships among sentences and words. Based on Multi-GCN, we propose a Multiplex Graph Summarization (Multi-GraS) model for extractive text summarization. Finally, we evaluate the proposed models on the CNN/DailyMail benchmark dataset to demonstrate effectiveness of our method.

extractive text summarization extractive text تلخيص النص الاستخراجي نص استخراج صناعة حمض الفوسفور

DIRECT: Direct and Indirect Responses in Conversational Text Corpus

740 - Association for Computation Linguistics 2021 مقالة

We create a large-scale dialogue corpus that provides pragmatic paraphrases to advance technology for understanding the underlying intentions of users. While neural conversation models acquire the ability to generate fluent responses through training on a dialogue corpus, previous corpora have mainly focused on the literal meanings of utterances. However, in reality, people do not always present their intentions directly. For example, if a person said to the operator of a reservation service I don't have enough budget.'', they, in fact, mean please find a cheaper option for me.'' Our corpus provides a total of 71,498 indirect--direct utterance pairs accompanied by a multi-turn dialogue history extracted from the MultiWoZ dataset. In addition, we propose three tasks to benchmark the ability of models to recognize and generate indirect and direct utterances. We also investigated the performance of state-of-the-art pre-trained models as baselines.

conversational text corpus conversational text نص المحادثة Corpus. نص محادثة نص كوربوس صناعة حمض الفوسفور

Sarcasm Detection and Building an English Language Corpus in Real Time

753 - Association for Computation Linguistics 2021 مقالة

This is a research proposal for doctoral research into sarcasm detection, and the real-time compilation of an English language corpus of sarcastic utterances. It details the previous research into similar topics, the potential research directions and the research aims.

english language corpus building an english اللغة الإنجليزية Corpus. في الوقت الحالى بناء اللغة الإنجليزية صناعة حمض الفوسفور

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

IceSum: An Icelandic Text Summarization Corpus

acesum: تخصيص نص أيسلندي كوربوس

Ask ChatGPT about the research

Read More

suggested questions