New community

Subscribe to the gold package and get unlimited access to Shamra Academy

EMISSOR: A platform for capturing multimodal interactions as Episodic Memories and Interpretations with Situated Scenario-based Ontological References

EMISSOR: منصة لالتقاط التفاعلات متعددة الوسائط كذكريات وتفسيرات بيئية مع مراجع Ontological قائمة على السيناريو

400 0 0 0.0 ( 0 )

Download Cite

Added by Association for Computation Linguistics مقالة

Publication date 2021

fields Artificial Intelligence

and research's language is English

Created by Shamra Editor

scenario-based ontological references situated scenario-based ontological ontological references المراجع القديمة القائمة على السيناريو تقع السيناريو على السيناريو المراجع الترطانية صناعة حمض الفوسفور

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

We present EMISSOR: a platform to capture multimodal interactions as recordings of episodic experiences with explicit referential interpretations that also yield an episodic Knowledge Graph (eKG). The platform stores streams of multiple modalities as parallel signals. Each signal is segmented and annotated independently with interpretation. Annotations are eventually mapped to explicit identities and relations in the eKG. As we ground signal segments from different modalities to the same instance representations, we also ground different modalities across each other. Unique to our eKG is that it accepts different interpretations across modalities, sources and experiences and supports reasoning over conflicting information and uncertainties that may result from multimodal experiences. EMISSOR can record and annotate experiments in virtual and real-world, combine data, evaluate system behavior and their performance for preset goals but also model the accumulation of knowledge and interpretations in the Knowledge Graph as a result of these episodic experiences.

References used

https://aclanthology.org/

rate research

Iconary: A Pictionary-Based Game for Testing Multimodal Communication with Drawings and Text

271 - Association for Computation Linguistics 2021 مقالة

Communicating with humans is challenging for AIs because it requires a shared understanding of the world, complex semantics (e.g., metaphors or analogies), and at times multi-modal gestures (e.g., pointing with a finger, or an arrow in a diagram). We investigate these challenges in the context of Iconary, a collaborative game of drawing and guessing based on Pictionary, that poses a novel challenge for the research community. In Iconary, a Guesser tries to identify a phrase that a Drawer is drawing by composing icons, and the Drawer iteratively revises the drawing to help the Guesser in response. This back-and-forth often uses canonical scenes, visual metaphor, or icon compositions to express challenging words, making it an ideal test for mixing language and visual/symbolic communication in AI. We propose models to play Iconary and train them on over 55,000 games between human players. Our models are skillful players and are able to employ world knowledge in language models to play with words unseen during training.

testing multimodal communication testing multimodal multimodal communication اختبار الاتصالات متعددة الوسائط اختبار multimodal. الاتصالات متعددة الوسائط صناعة حمض الفوسفور المزيد..

When Retriever-Reader Meets Scenario-Based Multiple-Choice Questions

301 - Association for Computation Linguistics 2021 مقالة

Scenario-based question answering (SQA) requires retrieving and reading paragraphs from a large corpus to answer a question which is contextualized by a long scenario description. Since a scenario contains both keyphrases for retrieval and much noise , retrieval for SQA is extremely difficult. Moreover, it can hardly be supervised due to the lack of relevance labels of paragraphs for SQA. To meet the challenge, in this paper we propose a joint retriever-reader model called JEEVES where the retriever is implicitly supervised only using QA labels via a novel word weighting mechanism. JEEVES significantly outperforms a variety of strong baselines on multiple-choice questions in three SQA datasets.

scenario-based question answering scenario-based multiple-choice questions sqa السيناريو على أساس السؤال الرد سيناريو أسئلة متعددة الخيارات القائمة على السيناريو SQA. صناعة حمض الفوسفور المزيد..

Capturing Logical Structure of Visually Structured Documents with Multimodal Transition Parser

305 - Association for Computation Linguistics 2021 مقالة

While many NLP pipelines assume raw, clean texts, many texts we encounter in the wild, including a vast majority of legal documents, are not so clean, with many of them being visually structured documents (VSDs) such as PDFs. Conventional preprocessi ng tools for VSDs mainly focused on word segmentation and coarse layout analysis, whereas fine-grained logical structure analysis (such as identifying paragraph boundaries and their hierarchies) of VSDs is underexplored. To that end, we proposed to formulate the task as prediction of transition labels'' between text fragments that maps the fragments to a tree, and developed a feature-based machine learning system that fuses visual, textual and semantic cues. Our system is easily customizable to different types of VSDs and it significantly outperformed baselines in identifying different structures in VSDs. For example, our system obtained a paragraph boundary detection F1 score of 0.953 which is significantly better than a popular PDF-to-text tool with an F1 score of 0.739.

multimodal transition parser visually structured documents capturing logical structure محلل الانتقال متعددة الوسائط وثائق منظمة بصريا التقاط الهيكل المنطقي صناعة حمض الفوسفور المزيد..

An Analysis of State-of-the-Art Models for Situated Interactive MultiModal Conversations (SIMMC)

573 - Association for Computation Linguistics 2021 مقالة

There is a growing interest in virtual assistants with multimodal capabilities, e.g., inferring the context of a conversation through scene understanding. The recently released situated and interactive multimodal conversations (SIMMC) dataset address es this trend by enabling research to create virtual assistants, which are capable of taking into account the scene that user sees when conversing with the user and also interacting with items in the scene. The SIMMC dataset is novel in that it contains fully annotated user-assistant, task-orientated dialogs where the user and an assistant co-observe the same visual elements and the latter can take actions to update the scene. The SIMMC challenge, held as part of theNinth Dialog System Technology Challenge(DSTC9), propelled the development of various models which together set a new state-of-the-art on the SIMMC dataset. In this work, we compare and analyze these models to identifywhat worked?', and the remaining gaps;whatnext?'. Our analysis shows that even though pretrained language models adapted to this set-ting show great promise, there are indications that multimodal context isn't fully utilised, and there is a need for better and scalable knowledge base integration. We hope this first-of-its-kind analysis for SIMMC models provides useful insights and opportunities for further research in multimodal conversational agents

interactive multimodal conversations situated interactive multimodal simmc محادثات متعددة الوسائط التفاعلية تقع متعددة الوسائط التفاعلية SIMMC. صناعة حمض الفوسفور المزيد..

MIMOQA: Multimodal Input Multimodal Output Question Answering

388 - Association for Computation Linguistics 2021 مقالة

Multimodal research has picked up significantly in the space of question answering with the task being extended to visual question answering, charts question answering as well as multimodal input question answering. However, all these explorations pr oduce a unimodal textual output as the answer. In this paper, we propose a novel task - MIMOQA - Multimodal Input Multimodal Output Question Answering in which the output is also multimodal. Through human experiments, we empirically show that such multimodal outputs provide better cognitive understanding of the answers. We also propose a novel multimodal question-answering framework, MExBERT, that incorporates a joint textual and visual attention towards producing such a multimodal output. Our method relies on a novel multimodal dataset curated for this problem from publicly available unimodal datasets. We show the superior performance of MExBERT against strong baselines on both the automatic as well as human metrics.

نهج التعلم متري output question answering input question answering إخراج سؤال الرد إجابة سؤال المدخلات صناعة حمض الفوسفور

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

EMISSOR: A platform for capturing multimodal interactions as Episodic Memories and Interpretations with Situated Scenario-based Ontological References

EMISSOR: منصة لالتقاط التفاعلات متعددة الوسائط كذكريات وتفسيرات بيئية مع مراجع Ontological قائمة على السيناريو

Ask ChatGPT about the research

Read More

suggested questions