Knowledge Graph Enhanced Event Extraction in Financial Documents

154 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Kaihao Guo

تاريخ النشر 2021

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Kaihao Guo - Tianpei Jiang - Haipeng Zhang

الحساب واللغة استرجاع المعلومات التعلم الآلي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Event extraction is a classic task in natural language processing with wide use in handling large amount of yet rapidly growing financial, legal, medical, and government documents which often contain multiple events with their elements scattered and mixed across the documents, making the problem much more difficult. Though the underlying relations between event elements to be extracted provide helpful contextual information, they are somehow overlooked in prior studies. We showcase the enhancement to this task brought by utilizing the knowledge graph that captures entity relations and their attributes. We propose a first event extraction framework that embeds a knowledge graph through a Graph Neural Network and integrates the embedding with regular features, all at document-level. Specifically, for extracting events from Chinese financial announcements, our method outperforms the state-of-the-art method by 5.3% in F1-score.

قيم البحث

135 - Misha Denil , Alban Demiraj , Nando de Freitas 2014

We present a hierarchical convolutional document model with an architecture designed to support introspection of the document structure. Using this model, we show how to use visualisation techniques from the computer vision literature to identify and extract topic-relevant sentences. We also introduce a new scalable evaluation technique for automatic sentence extraction systems that avoids the need for time consuming human annotation of validation data.

الحساب واللغة استرجاع المعلومات التعلم الآلي

Rapid Adaptation of BERT for Information Extraction on Domain-Specific Business Documents

153 - Ruixue Zhang , Wei Yang , Luyun Lin 2020

Techniques for automatically extracting important content elements from business documents such as contracts, statements, and filings have the potential to make business operations more efficient. This problem can be formulated as a sequence labeling task, and we demonstrate the adaption of BERT to two types of business documents: regulatory filings and property lease agreements. There are aspects of this problem that make it easier than standard information extraction tasks and other aspects that make it more difficult, but on balance we find that modest amounts of annotated data (less than 100 documents) are sufficient to achieve reasonable accuracy. We integrate our models into an end-to-end cloud platform that provides both an easy-to-use annotation interface as well as an inference interface that allows users to upload documents and inspect model outputs.

الحساب واللغة استرجاع المعلومات التعلم الآلي

Enriching BERT with Knowledge Graph Embeddings for Document Classification

123 - Malte Ostendorff , Peter Bourgonje , Maria Berger 2019

In this paper, we focus on the classification of books using short descriptive texts (cover blurbs) and additional metadata. Building upon BERT, a deep neural language model, we demonstrate how to combine text representations with metadata and knowle dge graph embeddings, which encode author information. Compared to the standard BERT approach we achieve considerably better results for the classification task. For a more coarse-grained classification using eight labels we achieve an F1- score of 87.20, while a detailed classification using 343 labels yields an F1-score of 64.70. We make the source code and trained models of our experiments publicly available

الحساب واللغة استرجاع المعلومات التعلم الآلي

Biomedical Event Extraction with Hierarchical Knowledge Graphs

88 - Kung-Hsiang Huang , Mu Yang , Nanyun Peng 2020

Biomedical event extraction is critical in understanding biomolecular interactions described in scientific corpus. One of the main challenges is to identify nested structured events that are associated with non-indicative trigger words. We propose to incorporate domain knowledge from Unified Medical Language System (UMLS) to a pre-trained language model via Graph Edge-conditioned Attention Networks (GEANet) and hierarchical graph representation. To better recognize the trigger words, each sentence is first grounded to a sentence graph based on a jointly modeled hierarchical knowledge graph from UMLS. The grounded graphs are then propagated by GEANet, a novel graph neural networks for enhanced capabilities in inferring complex events. On BioNLP 2011 GENIA Event Extraction task, our approach achieved 1.41% F1 and 3.19% F1 improvements on all events and complex events, respectively. Ablation studies confirm the importance of GEANet and hierarchical KG.

الحساب واللغة الذكاء الاصطناعي

Inductively Representing Out-of-Knowledge-Graph Entities by Optimal Estimation Under Translational Assumptions

325 - Damai Dai , Hua Zheng , Fuli Luo 2020

Conventional Knowledge Graph Completion (KGC) assumes that all test entities appear during training. However, in real-world scenarios, Knowledge Graphs (KG) evolve fast with out-of-knowledge-graph (OOKG) entities added frequently, and we need to repr esent these entities efficiently. Most existing Knowledge Graph Embedding (KGE) methods cannot represent OOKG entities without costly retraining on the whole KG. To enhance efficiency, we propose a simple and effective method that inductively represents OOKG entities by their optimal estimation under translational assumptions. Given pretrained embeddings of the in-knowledge-graph (IKG) entities, our method needs no additional learning. Experimental results show that our method outperforms the state-of-the-art methods with higher efficiency on two KGC tasks with OOKG entities.

الحساب واللغة استرجاع المعلومات التعلم الآلي