ترغب بنشر مسار تعليمي؟ اضغط هنا

Generating Concise and Readable Summaries of XML Documents

132   0   0.0 ( 0 )
 نشر من قبل Maya Ramanath
 تاريخ النشر 2009
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

XML has become the de-facto standard for data representation and exchange, resulting in large scale repositories and warehouses of XML data. In order for users to understand and explore these large collections, a summarized, birds eye view of the available data is a necessity. In this paper, we are interested in semantic XML document summaries which present the important information available in an XML document to the user. In the best case, such a summary is a concise replacement for the original document itself. At the other extreme, it should at least help the user make an informed choice as to the relevance of the document to his needs. In this paper, we address the two main issues which arise in producing such meaningful and concise summaries: i) which tags or text units are important and should be included in the summary, ii) how to generate summaries of different sizes.%for different memory budgets. We conduct user studies with different real-life datasets and show that our methods are useful and effective in practice.



قيم البحث

اقرأ أيضاً

Researchers and scientists increasingly find themselves in the position of having to quickly understand large amounts of technical material. Our goal is to effectively serve this need by using bibliometric text mining and summarization techniques to generate summaries of scientific literature. We show how we can use citations to produce automatically generated, readily consumable, technical extractive summaries. We first propose C-LexRank, a model for summarizing single scientific articles based on citations, which employs community detection and extracts salient information-rich sentences. Next, we further extend our experiments to summarize a set of papers, which cover the same scientific topic. We generate extractive summaries of a set of Question Answering (QA) and Dependency Parsing (DP) papers, their abstracts, and their citation sentences and show that citations have unique information amenable to creating a summary.
We study the problem of validating XML documents of size $N$ against general DTDs in the context of streaming algorithms. The starting point of this work is a well-known space lower bound. There are XML documents and DTDs for which $p$-pass streaming algorithms require $Omega(N/p)$ space. We show that when allowing access to external memory, there is a deterministic streaming algorithm that solves this problem with memory space $O(log^2 N)$, a constant number of auxiliary read/write streams, and $O(log N)$ total number of passes on the XML document and auxiliary streams. An important intermediate step of this algorithm is the computation of the First-Child-Next-Sibling (FCNS) encoding of the initial XML document in a streaming fashion. We study this problem independently, and we also provide memory efficient streaming algorithms for decoding an XML document given in its FCNS encoding. Furthermore, validating XML documents encoding binary trees in the usual streaming model without external memory can be done with sublinear memory. There is a one-pass algorithm using $O(sqrt{N log N})$ space, and a bidirectional two-pass algorithm using $O(log^2 N)$ space performing this task.
Transforming XML documents with conventional XML languages, like XSL-T, is disadvantageous because there is too lax abstraction on the target language and it is rather difficult to recognize rule-oriented transformations. Prolog as a programming lang uage of declarative paradigm is especially good for implementation of analysis of formal languages. Prolog seems also to be good for term manipulation, complex schema-transformation and text retrieval. In this report an appropriate model for XML documents is proposed, the basic transformation language for Prolog LTL is defined and the expressiveness power compared with XSL-T is demonstrated, the implementations used throughout are multi paradigmatic.
130 - Pierre Geneves 2014
This thesis describes the theoretical and practical foundations of a system for the static analysis of XML processing languages. The system relies on a fixpoint temporal logic with converse, derived from the mu-calculus, where models are finite trees . This calculus is expressive enough to capture regular tree types along with multi-directional navigation in trees, while having a single exponential time complexity. Specifically the decidability of the logic is proved in time 2^O(n) where n is the size of the input formula. Major XML concepts are linearly translated into the logic: XPath navigation and node selection semantics, and regular tree languages (which include DTDs and XML Schemas). Based on these embeddings, several problems of major importance in XML applications are reduced to satisfiability of the logic. These problems include XPath containment, emptiness, equivalence, overlap, coverage, in the presence or absence of regular tree type constraints, and the static type-checking of an annotated query. The focus is then given to a sound and complete algorithm for deciding the logic, along with a detailed complexity analysis, and crucial implementation techniques for building an effective solver. Practical experiments using a full implementation of the system are presented. The system appears to be efficient in practice for several realistic scenarios. The main application of this work is a new class of static analyzers for programming languages using both XPath expressions and XML type annotations (input and output). Such analyzers allow to ensure at compile-time valuable properties such as type-safety and optimizations, for safer and more efficient XML processing.
This document describes how to use the XML static analyzer in practice. It provides informal documentation for using the XML reasoning solver implementation. The solver allows automated verification of properties that are expressed as logical formula s over trees. A logical formula may for instance express structural constraints or navigation properties (like e.g. path existence and node selection) in finite trees. Logical formulas can be expressed using the syntax of XPath expressions, DTD, XML Schemas, and Relax NG definitions.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا