Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

LongSumm 2021: Session based automatic summarization model for scientific document

Longsumm 2021: نموذج تلخيص تلقائي في الجلسة المستند العلمي

310 0 0 0.0 ( 0 )

Download Cite

Added by Association for Computation Linguistics مقالة

Publication date 2021

fields Artificial Intelligence

and research's language is English

Created by Shamra Editor

based automatic summarization session based automatic scientific document تلخيص تلقائي مقرها القائم على الجلسة التلقائي الوثيقة العلمية صناعة حمض الفوسفور

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Most summarization task focuses on generating relatively short summaries. Such a length constraint might not be appropriate when summarizing scientific work. The LongSumm task needs participants generate long summary for scientific document. This task usual can be solved by language model. But an important problem is that model like BERT is limit to memory, and can not deal with a long input like a document. Also generate a long output is hard. In this paper, we propose a session based automatic summarization model(SBAS) which using a session and ensemble mechanism to generate long summary. And our model achieves the best performance in the LongSumm task.

References used

https://aclanthology.org/

rate research

D2S: Document-to-Slide Generation Via Query-Based Text Summarization

745 - Association for Computation Linguistics 2021 مقالة

Presentations are critical for communication in all areas of our lives, yet the creation of slide decks is often tedious and time-consuming. There has been limited research aiming to automate the document-to-slides generation process and all face a c ritical challenge: no publicly available dataset for training and benchmarking. In this work, we first contribute a new dataset, SciDuet, consisting of pairs of papers and their corresponding slides decks from recent years' NLP and ML conferences (e.g., ACL). Secondly, we present D2S, a novel system that tackles the document-to-slides task with a two-step approach: 1) Use slide titles to retrieve relevant and engaging text, figures, and tables; 2) Summarize the retrieved context into bullet points with long-form question answering. Our evaluation suggests that long-form QA outperforms state-of-the-art summarization baselines on both automated ROUGE metrics and qualitative human evaluation.

query-based text summarization query-based text generation via query-based تلخيص النص المستند إلى الاستعلام النص المستند إلى الاستعلام جيل عبر الاستعلام صناعة حمض الفوسفور المزيد..

YNU-HPCC at SemEval-2021 Task 10: Using a Transformer-based Source-Free Domain Adaptation Model for Semantic Processing

933 - Association for Computation Linguistics 2021 مقالة

Data sharing restrictions are common in NLP datasets. The purpose of this task is to develop a model trained in a source domain to make predictions for a target domain with related domain data. To address the issue, the organizers provided the models that fine-tuned a large number of source domain data on pre-trained models and the dev data for participants. But the source domain data was not distributed. This paper describes the provided model to the NER (Name entity recognition) task and the ways to develop the model. As a little data provided, pre-trained models are suitable to solve the cross-domain tasks. The models fine-tuned by large number of another domain could be effective in new domain because the task had no change.

اكتشاف التكيف مجال الكشف. transformer-based source-free domain المجال المستند إلى المصدر صناعة حمض الفوسفور

Leveraging Information Bottleneck for Scientific Document Summarization

1173 - Association for Computation Linguistics 2021 مقالة

This paper presents an unsupervised extractive approach to summarize scientific long documents based on the Information Bottleneck principle. Inspired by previous work which uses the Information Bottleneck principle for sentence compression, we exten d it to document level summarization with two separate steps. In the first step, we use signal(s) as queries to retrieve the key content from the source document. Then, a pre-trained language model conducts further sentence search and edit to return the final extracted summaries. Importantly, our work can be flexibly extended to a multi-view framework by different signals. Automatic evaluation on three scientific document datasets verifies the effectiveness of the proposed framework. The further human evaluation suggests that the extracted summaries cover more content aspects than previous systems.

leveraging information bottleneck الاستفادة من المعلومات عنق الزجاجة صناعة حمض الفوسفور

The Effect of Pretraining on Extractive Summarization for Scientific Documents

854 - Association for Computation Linguistics 2021 مقالة

Large pretrained models have seen enormous success in extractive summarization tasks. In this work, we investigate the influence of pretraining on a BERT-based extractive summarization system for scientific documents. We derive significant performanc e improvements using an intermediate pretraining step that leverages existing summarization datasets and report state-of-the-art results on a recently released scientific summarization dataset, SciTLDR. We systematically analyze the intermediate pretraining step by varying the size and domain of the pretraining corpus, changing the length of the input sequence in the target task and varying target tasks. We also investigate how intermediate pretraining interacts with contextualized word embeddings trained on different domains.

bootstraping متعددة اللغات extractive summarization tasks مهام تلخيص الاستخراجية صناعة حمض الفوسفور

Semantic-Based Opinion Summarization

779 - Association for Computation Linguistics 2021 مقالة

The amount of information available online can be overwhelming for users to digest, specially when dealing with other users' comments when making a decision about buying a product or service. In this context, opinion summarization systems are of grea t value, extracting important information from the texts and presenting them to the user in a more understandable manner. It is also known that the usage of semantic representations can benefit the quality of the generated summaries. This paper aims at developing opinion summarization methods based on Abstract Meaning Representation of texts in the Brazilian Portuguese language. Four different methods have been investigated, alongside some literature approaches. The results show that a Machine Learning-based method produced summaries of higher quality, outperforming other literature techniques on manually constructed semantic graphs. We also show that using parsed graphs over manually annotated ones harmed the output. Finally, an analysis of how important different types of information are for the summarization process suggests that using Sentiment Analysis features did not improve summary quality.

semantic-based opinion summarization opinion summarization semantic-based opinion تلخيص الرأي الدلالي تلخيص الرأي الرأي الدلالي صناعة حمض الفوسفور المزيد..

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

LongSumm 2021: Session based automatic summarization model for scientific document

Longsumm 2021: نموذج تلخيص تلقائي في الجلسة المستند العلمي

Ask ChatGPT about the research

Read More

suggested questions