ROPE: Reading Order Equivariant Positional Encoding for Graph-based Document Information Extraction

76 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Chen-Yu Lee

تاريخ النشر 2021

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Chen-Yu Lee - Chun-Liang Li - Chu Wang

الحساب واللغة التعلم الآلي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Natural reading orders of words are crucial for information extraction from form-like documents. Despite recent advances in Graph Convolutional Networks (GCNs) on modeling spatial layout patterns of documents, they have limited ability to capture reading orders of given word-level node representations in a graph. We propose Reading Order Equivariant Positional Encoding (ROPE), a new positional encoding technique designed to apprehend the sequential presentation of words in documents. ROPE generates unique reading order codes for neighboring words relative to the target word given a word-level graph connectivity. We study two fundamental document entity extraction tasks including word labeling and word grouping on the public FUNSD dataset and a large-scale payment dataset. We show that ROPE consistently improves existing GCNs with a margin up to 8.4% F1-score.

قيم البحث

218 - Shuang Zeng , Runxin Xu , Baobao Chang 2020

Document-level relation extraction aims to extract relations among entities within a document. Different from sentence-level relation extraction, it requires reasoning over multiple sentences across a document. In this paper, we propose Graph Aggrega tion-and-Inference Network (GAIN) featuring double graphs. GAIN first constructs a heterogeneous mention-level graph (hMG) to model complex interaction among different mentions across the document. It also constructs an entity-level graph (EG), based on which we propose a novel path reasoning mechanism to infer relations between entities. Experiments on the public dataset, DocRED, show GAIN achieves a significant performance improvement (2.85 on F1) over the previous state-of-the-art. Our code is available at https://github.com/DreamInvoker/GAIN .

الحساب واللغة الذكاء الاصطناعي التعلم الآلي

GraphIE: A Graph-Based Framework for Information Extraction

195 - Yujie Qian , Enrico Santus , Zhijing Jin 2018

Most modern Information Extraction (IE) systems are implemented as sequential taggers and only model local dependencies. Non-local and non-sequential context is, however, a valuable source of information to improve predictions. In this paper, we intr oduce GraphIE, a framework that operates over a graph representing a broad set of dependencies between textual units (i.e. words or sentences). The algorithm propagates information between connected nodes through graph convolutions, generating a richer representation that can be exploited to improve word-level predictions. Evaluation on three different tasks --- namely textual, social media and visual information extraction --- shows that GraphIE consistently outperforms the state-of-the-art sequence tagging model by a significant margin.

الحساب واللغة

TRIE: End-to-End Text Reading and Information Extraction for Document Understanding

87 - Peng Zhang , Yunlu Xu , Zhanzhan Cheng 2020

Since real-world ubiquitous documents (e.g., invoices, tickets, resumes and leaflets) contain rich information, automatic document image understanding has become a hot topic. Most existing works decouple the problem into two separate tasks, (1) text reading for detecting and recognizing texts in images and (2) information extraction for analyzing and extracting key elements from previously extracted plain text. However, they mainly focus on improving information extraction task, while neglecting the fact that text reading and information extraction are mutually correlated. In this paper, we propose a unified end-to-end text reading and information extraction network, where the two tasks can reinforce each other. Specifically, the multimodal visual and textual features of text reading are fused for information extraction and in turn, the semantics in information extraction contribute to the optimization of text reading. On three real-world datasets with diverse document images (from fixed layout to variable layout, from structured text to semi-structured text), our proposed method significantly outperforms the state-of-the-art methods in both efficiency and accuracy.

الرؤية الحاسوبية وتمييز الأنماط

Sentence Extraction-Based Machine Reading Comprehension for Vietnamese

118 - Phong Nguyen-Thuan Do , Nhat Duy Nguyen , Tin Van Huynh 2021

The development of natural language processing (NLP) in general and machine reading comprehension in particular has attracted the great attention of the research community. In recent years, there are a few datasets for machine reading comprehension t asks in Vietnamese with large sizes, such as UIT-ViQuAD and UIT-ViNewsQA. However, the datasets are not diverse in answers to serve the research. In this paper, we introduce UIT-ViWikiQA, the first dataset for evaluating sentence extraction-based machine reading comprehension in the Vietnamese language. The UIT-ViWikiQA dataset is converted from the UIT-ViQuAD dataset, consisting of comprises 23.074 question-answers based on 5.109 passages of 174 Wikipedia Vietnamese articles. We propose a conversion algorithm to create the dataset for sentence extraction-based machine reading comprehension and three types of approaches for sentence extraction-based machine reading comprehension in Vietnamese. Our experiments show that the best machine model is XLM-R_Large, which achieves an exact match (EM) of 85.97% and an F1-score of 88.77% on our dataset. Besides, we analyze experimental results in terms of the question type in Vietnamese and the effect of context on the performance of the MRC models, thereby showing the challenges from the UIT-ViWikiQA dataset that we propose to the language processing community.

الحساب واللغة

Mention-centered Graph Neural Network for Document-level Relation Extraction

143 - Jiaxin Pan , Min Peng , Yiyan Zhang 2021

Document-level relation extraction aims to discover relations between entities across a whole document. How to build the dependency of entities from different sentences in a document remains to be a great challenge. Current approaches either leverage syntactic trees to construct document-level graphs or aggregate inference information from different sentences. In this paper, we build cross-sentence dependencies by inferring compositional relations between inter-sentence mentions. Adopting aggressive linking strategy, intermediate relations are reasoned on the document-level graphs by mention convolution. We further notice the generalization problem of NA instances, which is caused by incomplete annotation and worsened by fully-connected mention pairs. An improved ranking loss is proposed to attend this problem. Experiments show the connections between different mentions are crucial to document-level relation extraction, which enables the model to extract more meaningful higher-level compositional relations.

الحساب واللغة الذكاء الاصطناعي