New community

Subscribe to the gold package and get unlimited access to Shamra Academy

CoZo+ - A Content Zoning Engine for textual documents

139 0 0.0 ( 0 )

Download Cite

Added by Cynthia Wagner CW

Publication date 2008

fields Informatics Engineering

and research's language is English

Authors Cynthia Wagner - - Christoph Schommer

Computation and Language Information Retrieval

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Content zoning can be understood as a segmentation of textual documents into zones. This is inspired by [6] who initially proposed an approach for the argumentative zoning of textual documents. With the prototypical CoZo+ engine, we focus on content zoning towards an automatic processing of textual streams while considering only the actors as the zones. We gain information that can be used to realize an automatic recognition of content for pre-defined actors. We understand CoZo+ as a necessary pre-step towards an automatic generation of summaries and to make intellectual ownership of documents detectable.

rate research

Textual Analysis for Studying Chinese Historical Documents and Literary Novels

104 - Chao-Lin Liu , Guan-Tao Jin , Hongsu Wang 2015

We analyzed historical and literary documents in Chinese to gain insights into research issues, and overview our studies which utilized four different sources of text materials in this paper. We investigated the history of concepts and transliterated words in China with the Database for the Study of Modern China Thought and Literature, which contains historical documents about China between 1830 and 1930. We also attempted to disambiguate names that were shared by multiple government officers who served between 618 and 1912 and were recorded in Chinese local gazetteers. To showcase the potentials and challenges of computer-assisted analysis of Chinese literatures, we explored some interesting yet non-trivial questions about two of the Four Great Classical Novels of China: (1) Which monsters attempted to consume the Buddhist monk Xuanzang in the Journey to the West (JTTW), which was published in the 16th century, (2) Which was the most powerful monster in JTTW, and (3) Which major role smiled the most in the Dream of the Red Chamber, which was published in the 18th century. Similar approaches can be applied to the analysis and study of modern documents, such as the newspaper articles published about the 228 incident that occurred in 1947 in Taiwan.

Computation and Language Digital Libraries

Extraction of Salient Sentences from Labelled Documents

137 - Misha Denil , Alban Demiraj , Nando de Freitas 2014

We present a hierarchical convolutional document model with an architecture designed to support introspection of the document structure. Using this model, we show how to use visualisation techniques from the computer vision literature to identify and extract topic-relevant sentences. We also introduce a new scalable evaluation technique for automatic sentence extraction systems that avoids the need for time consuming human annotation of validation data.

Computation and Language Information Retrieval Machine Learning

Rapid Adaptation of BERT for Information Extraction on Domain-Specific Business Documents

153 - Ruixue Zhang , Wei Yang , Luyun Lin 2020

Techniques for automatically extracting important content elements from business documents such as contracts, statements, and filings have the potential to make business operations more efficient. This problem can be formulated as a sequence labeling task, and we demonstrate the adaption of BERT to two types of business documents: regulatory filings and property lease agreements. There are aspects of this problem that make it easier than standard information extraction tasks and other aspects that make it more difficult, but on balance we find that modest amounts of annotated data (less than 100 documents) are sufficient to achieve reasonable accuracy. We integrate our models into an end-to-end cloud platform that provides both an easy-to-use annotation interface as well as an inference interface that allows users to upload documents and inspect model outputs.

Computation and Language Information Retrieval Machine Learning

Knowledge Graph Enhanced Event Extraction in Financial Documents

153 - Kaihao Guo , Tianpei Jiang , Haipeng Zhang 2021

Event extraction is a classic task in natural language processing with wide use in handling large amount of yet rapidly growing financial, legal, medical, and government documents which often contain multiple events with their elements scattered and mixed across the documents, making the problem much more difficult. Though the underlying relations between event elements to be extracted provide helpful contextual information, they are somehow overlooked in prior studies. We showcase the enhancement to this task brought by utilizing the knowledge graph that captures entity relations and their attributes. We propose a first event extraction framework that embeds a knowledge graph through a Graph Neural Network and integrates the embedding with regular features, all at document-level. Specifically, for extracting events from Chinese financial announcements, our method outperforms the state-of-the-art method by 5.3% in F1-score.

Computation and Language Information Retrieval Machine Learning

Multilingual Email Zoning

56 - Bruno Jardim , Ricardo Rei , Mariana S. C. Almeida 2021

The segmentation of emails into functional zones (also dubbed email zoning) is a relevant preprocessing step for most NLP tasks that deal with emails. However, despite the multilingual character of emails and their applications, previous literature regarding email zoning corpora and systems was developed essentially for English. In this paper, we analyse the existing email zoning corpora and propose a new multilingual benchmark composed of 625 emails in Portuguese, Spanish and French. Moreover, we introduce OKAPI, the first multilingual email segmentation model based on a language agnostic sentence encoder. Besides generalizing well for unseen languages, our model is competitive with current English benchmarks, and reached new state-of-the-art performances for domain adaptation tasks in English.

Computation and Language Applications Machine Learning

comments

Fetching comments

Tishreen University

Additional details More universities

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

CoZo+ - A Content Zoning Engine for textual documents

Ask ChatGPT about the research

No Arabic abstract

Read More