Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

A Review on Document Information Extraction Approaches

مراجعة حول نهج استخراج معلومات المستند

1296 0 0 0.0 ( 0 )

Download Cite

Added by Association for Computation Linguistics مقالة

Publication date 2021

fields Artificial Intelligence

and research's language is English

Created by Shamra Editor

نصوص وسائل الإعلام استخراج المعلومات صناعة حمض الفوسفور

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Information extraction from documents has become great use of novel natural language processing areas. Most of the entity extraction methodologies are variant in a context such as medical area, financial area, also come even limited to the given language. It is better to have one generic approach applicable for any document type to extract entity information regardless of language, context, and structure. Also, another issue in such research is structural analysis while keeping the hierarchical, semantic, and heuristic features. Another problem identified is that usually, it requires a massive training corpus. Therefore, this research focus on mitigating such barriers. Several approaches have been identifying towards building document information extractors focusing on different disciplines. This research area involves natural language processing, semantic analysis, information extraction, and conceptual modelling. This paper presents a review of the information extraction mechanism to construct a generic framework for document extraction with aim of providing a solid base for upcoming research.

References used

https://aclanthology.org/

rate research

Entity and Evidence Guided Document-Level Relation Extraction

743 - Association for Computation Linguistics 2021 مقالة

Document-level relation extraction is a challenging task, requiring reasoning over multiple sentences to predict a set of relations in a document. In this paper, we propose a novel framework E2GRE (Entity and Evidence Guided Relation Extraction) that jointly extracts relations and the underlying evidence sentences by using large pretrained language model (LM) as input encoder. First, we propose to guide the pretrained LM's attention mechanism to focus on relevant context by using attention probabilities as additional features for evidence prediction. Furthermore, instead of feeding the whole document into pretrained LMs to obtain entity representation, we concatenate document text with head entities to help LMs concentrate on parts of the document that are more related to the head entity. Our E2GRE jointly learns relation extraction and evidence prediction effectively, showing large gains on both these tasks, which we find are highly correlated.

document-level relation extraction guided relation extraction استخراج العلاقة على مستوى المستند استخراج العلاقة الموجهة استخراج العلاقة صناعة حمض الفوسفور

Unsupervised Relation Extraction: A Variational Autoencoder Approach

910 - Association for Computation Linguistics 2021 مقالة

Unsupervised relation extraction works by clustering entity pairs that have the same relations in the text. Some existing variational autoencoder (VAE)-based approaches train the relation extraction model as an encoder that generates relation classif ications. A decoder is trained along with the encoder to reconstruct the encoder input based on the encoder-generated relation classifications. These classifications are a latent variable so they are required to follow a pre-defined prior distribution which results in unstable training. We propose a VAE-based unsupervised relation extraction technique that overcomes this limitation by using the classifications as an intermediate variable instead of a latent variable. Specifically, classifications are conditioned on sentence input, while the latent variable is conditioned on both the classifications and the sentence input. This allows our model to connect the decoder with the encoder without putting restrictions on the classification distribution; which improves training stability. Our approach is evaluated on the NYT dataset and outperforms state-of-the-art methods.

unsupervised relation extraction variational autoencoder approach استخراج العلاقة غير المدعومة نهج السيارات الآلي المتغير صناعة حمض الفوسفور

Document-level Event Extraction with Efficient End-to-end Learning of Cross-event Dependencies

725 - Association for Computation Linguistics 2021 مقالة

Fully understanding narratives often requires identifying events in the context of whole documents and modeling the event relations. However, document-level event extraction is a challenging task as it requires the extraction of event and entity core ference, and capturing arguments that span across different sentences. Existing works on event extraction usually confine on extracting events from single sentences, which fail to capture the relationships between the event mentions at the scale of a document, as well as the event arguments that appear in a different sentence than the event trigger. In this paper, we propose an end-to-end model leveraging Deep Value Networks (DVN), a structured prediction algorithm, to efficiently capture cross-event dependencies for document-level event extraction. Experimental results show that our approach achieves comparable performance to CRF-based models on ACE05, while enjoys significantly higher computational efficiency.

document-level event extraction event extraction استخراج الأحداث على مستوى المستند استخراج الأحداث هدف صناعة حمض الفوسفور

Cost-effective End-to-end Information Extraction for Semi-structured Document Images

717 - Association for Computation Linguistics 2021 مقالة

A real-world information extraction (IE) system for semi-structured document images often involves a long pipeline of multiple modules, whose complexity dramatically increases its development and maintenance cost. One can instead consider an end-to-e nd model that directly maps the input to the target output and simplify the entire process. However, such generation approach is known to lead to unstable performance if not designed carefully. Here we present our recent effort on transitioning from our existing pipeline-based IE system to an end-to-end system focusing on practical challenges that are associated with replacing and deploying the system in real, large-scale production. By carefully formulating document IE as a sequence generation task, we show that a single end-to-end IE system can be built and still achieve competent performance.

semi-structured document images semi-structured document صور وثيقة شبه منظمة وثيقة شبه منظمة صناعة حمض الفوسفور

Leveraging Information Bottleneck for Scientific Document Summarization

1141 - Association for Computation Linguistics 2021 مقالة

This paper presents an unsupervised extractive approach to summarize scientific long documents based on the Information Bottleneck principle. Inspired by previous work which uses the Information Bottleneck principle for sentence compression, we exten d it to document level summarization with two separate steps. In the first step, we use signal(s) as queries to retrieve the key content from the source document. Then, a pre-trained language model conducts further sentence search and edit to return the final extracted summaries. Importantly, our work can be flexibly extended to a multi-view framework by different signals. Automatic evaluation on three scientific document datasets verifies the effectiveness of the proposed framework. The further human evaluation suggests that the extracted summaries cover more content aspects than previous systems.

leveraging information bottleneck الاستفادة من المعلومات عنق الزجاجة صناعة حمض الفوسفور

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

A Review on Document Information Extraction Approaches

مراجعة حول نهج استخراج معلومات المستند

Ask ChatGPT about the research

Read More

suggested questions