Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Capturing Logical Structure of Visually Structured Documents with Multimodal Transition Parser

التقاط الهيكل المنطقي للوثائق المنظمة بصريا مع محلل انتقال متعدد الوسائط

611 0 0 0.0 ( 0 )

Download Cite

Added by Association for Computation Linguistics مقالة

Publication date 2021

fields Artificial Intelligence

and research's language is English

Created by Shamra Editor

multimodal transition parser visually structured documents capturing logical structure محلل الانتقال متعددة الوسائط وثائق منظمة بصريا التقاط الهيكل المنطقي صناعة حمض الفوسفور

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

While many NLP pipelines assume raw, clean texts, many texts we encounter in the wild, including a vast majority of legal documents, are not so clean, with many of them being visually structured documents (VSDs) such as PDFs. Conventional preprocessing tools for VSDs mainly focused on word segmentation and coarse layout analysis, whereas fine-grained logical structure analysis (such as identifying paragraph boundaries and their hierarchies) of VSDs is underexplored. To that end, we proposed to formulate the task as prediction of transition labels'' between text fragments that maps the fragments to a tree, and developed a feature-based machine learning system that fuses visual, textual and semantic cues. Our system is easily customizable to different types of VSDs and it significantly outperformed baselines in identifying different structures in VSDs. For example, our system obtained a paragraph boundary detection F1 score of 0.953 which is significantly better than a popular PDF-to-text tool with an F1 score of 0.739.

References used

https://aclanthology.org/

rate research

A Transition-based Parser for Unscoped Episodic Logical Forms

735 - Association for Computation Linguistics 2021 مقالة

Episodic Logic: Unscoped Logical Form'' (EL-ULF) is a semantic representation capturing predicate-argument structure as well as more challenging aspects of language within the Episodic Logic formalism. We present the first learned approach for parsin g sentences into ULFs, using a growing set of annotated examples. The results provide a strong baseline for future improvement. Our method learns a sequence-to-sequence model for predicting the transition action sequence within a modified cache transition system. We evaluate the efficacy of type grammar-based constraints, a word-to-symbol lexicon, and transition system state features in this task. Our system is available at https://github.com/genelkim/ulf-transition-parser. We also present the first official annotated ULF dataset at https://www.cs.rochester.edu/u/gkim21/ulf/resources/.

unscoped logical form نموذج منطقي غير مستقيم صناعة حمض الفوسفور

Building a Video-and-Language Dataset with Human Actions for Multimodal Logical Inference

711 - Association for Computation Linguistics 2021 مقالة

This paper introduces a new video-and-language dataset with human actions for multimodal logical inference, which focuses on intentional and aspectual expressions that describe dynamic human actions. The dataset consists of 200 videos, 5,554 action l abels, and 1,942 action triplets of the form (subject, predicate, object) that can be easily translated into logical semantic representations. The dataset is expected to be useful for evaluating multimodal inference systems between videos and semantically complicated sentences including negation and quantification.

multimodal logical inference human actions dynamic human actions الاستدلال المنطقي متعدد الوسائط الإجراءات البشرية الإجراءات البشرية الديناميكية صناعة حمض الفوسفور المزيد..

Augmenting Knowledge-grounded Conversations with Sequential Knowledge Transition

835 - Association for Computation Linguistics 2021 مقالة

Knowledge data are massive and widespread in the real-world, which can serve as good external sources to enrich conversations. However, in knowledge-grounded conversations, current models still lack the fine-grained control over knowledge selection a nd integration with dialogues, which finally leads to the knowledge-irrelevant response generation problems: 1) knowledge selection merely relies on the dialogue context, ignoring the inherent knowledge transitions along with conversation flows; 2) the models often over-fit during training, resulting with incoherent response by referring to unrelated tokens from specific knowledge content in the testing phase; 3) although response is generated upon the dialogue history and knowledge, the models often tend to overlook the selected knowledge, and hence generates knowledge-irrelevant response. To address these problems, we proposed to explicitly model the knowledge transition in sequential multi-turn conversations by abstracting knowledge into topic tags. Besides, to fully utilizing the selected knowledge in generative process, we propose pre-training a knowledge-aware response generator to pay more attention on the selected knowledge. In particular, a sequential knowledge transition model equipped with a pre-trained knowledge-aware response generator (SKT-KG) formulates the high-level knowledge transition and fully utilizes the limited knowledge data. Experimental results on both structured and unstructured knowledge-grounded dialogue benchmarks indicate that our model achieves better performance over baseline models.

augmenting knowledge-grounded conversations sequential knowledge transition زيادة المحادثات المعرفة انتقال المعرفة المتسلسل صناعة حمض الفوسفور

EMISSOR: A platform for capturing multimodal interactions as Episodic Memories and Interpretations with Situated Scenario-based Ontological References

694 - Association for Computation Linguistics 2021 مقالة

We present EMISSOR: a platform to capture multimodal interactions as recordings of episodic experiences with explicit referential interpretations that also yield an episodic Knowledge Graph (eKG). The platform stores streams of multiple modalities as parallel signals. Each signal is segmented and annotated independently with interpretation. Annotations are eventually mapped to explicit identities and relations in the eKG. As we ground signal segments from different modalities to the same instance representations, we also ground different modalities across each other. Unique to our eKG is that it accepts different interpretations across modalities, sources and experiences and supports reasoning over conflicting information and uncertainties that may result from multimodal experiences. EMISSOR can record and annotate experiments in virtual and real-world, combine data, evaluate system behavior and their performance for preset goals but also model the accumulation of knowledge and interpretations in the Knowledge Graph as a result of these episodic experiences.

scenario-based ontological references situated scenario-based ontological ontological references المراجع القديمة القائمة على السيناريو تقع السيناريو على السيناريو المراجع الترطانية صناعة حمض الفوسفور المزيد..

Visually Grounded Concept Composition

609 - Association for Computation Linguistics 2021 مقالة

We investigate ways to compose complex concepts in texts from primitive ones while grounding them in images. We propose Concept and Relation Graph (CRG), which builds on top of constituency analysis and consists of recursively combined concepts with predicate functions. Meanwhile, we propose a concept composition neural network called Composer to leverage the CRG for visually grounded concept learning. Specifically, we learn the grounding of both primitive and all composed concepts by aligning them to images and show that learning to compose leads to more robust grounding results, measured in text-to-image matching accuracy. Notably, our model can model grounded concepts forming at both the finer-grained sentence level and the coarser-grained intermediate level (or word-level). Composer leads to pronounced improvement in matching accuracy when the evaluation data has significant compound divergence from the training data.

visually grounded concept grounded concept composition concept composition مفهوم الأساس بصريا تكوين مفهوم الأساس تكوين المفهوم صناعة حمض الفوسفور المزيد..

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Capturing Logical Structure of Visually Structured Documents with Multimodal Transition Parser

التقاط الهيكل المنطقي للوثائق المنظمة بصريا مع محلل انتقال متعدد الوسائط

Ask ChatGPT about the research

Read More

suggested questions