Interactive Storytelling over Document Collections

43 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Mohammad Islam

تاريخ النشر 2016

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Dipayan Maiti - Mohammad Raihanul Islam - Scotland Leman

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Storytelling algorithms aim to connect the dots between disparate documents by linking starting and ending documents through a series of intermediate documents. Existing storytelling algorithms are based on notions of coherence and connectivity, and thus the primary way by which users can steer the story construction is via design of suitable similarity functions. We present an alternative approach to storytelling wherein the user can interactively and iteratively provide must use constraints to preferentially support the construction of some stories over others. The three innovations in our approach are distance measures based on (inferred) topic distributions, the use of constraints to define sets of linear inequalities over paths, and the introduction of slack and surplus variables to condition the topic distribution to preferentially emphasize desired terms over others. We describe experimental results to illustrate the effectiveness of our interactive storytelling approach over multiple text datasets.

قيم البحث

67 - Tianmin Shu , Caiming Xiong , Ying Nian Wu 2018

The ability of modeling the other agents, such as understanding their intentions and skills, is essential to an agents interactions with other agents. Conventional agent modeling relies on passive observation from demonstrations. In this work, we pro pose an interactive agent modeling scheme enabled by encouraging an agent to learn to probe. In particular, the probing agent (i.e. a learner) learns to interact with the environment and with a target agent (i.e., a demonstrator) to maximize the change in the observed behaviors of that agent. Through probing, rich behaviors can be observed and are used for enhancing the agent modeling to learn a more accurate mind model of the target agent. Our framework consists of two learning processes: i) imitation learning for an approximated agent model and ii) pure curiosity-driven reinforcement learning for an efficient probing policy to discover new behaviors that otherwise can not be observed. We have validated our approach in four different tasks. The experimental results suggest that the agent model learned by our approach i) generalizes better in novel scenarios than the ones learned by passive observation, random probing, and other curiosity-driven approaches do, and ii) can be used for enhancing performance in multiple applications including distilling optimal planning to a policy net, collaboration, and competition. A video demo is available at https://www.dropbox.com/s/8mz6rd3349tso67/Probing_Demo.mov?dl=0

الذكاء الاصطناعي التعلم الآلي أنظمة متعددة العملاء

DOCENT: Learning Self-Supervised Entity Representations from Large Document Collections

68 - Yury Zemlyanskiy , Sudeep Gandhe , Ruining He 2021

This paper explores learning rich self-supervised entity representations from large amounts of the associated text. Once pre-trained, these models become applicable to multiple entity-centric tasks such as ranked retrieval, knowledge base completion, question answering, and more. Unlike other methods that harvest self-supervision signals based merely on a local context within a sentence, we radically expand the notion of context to include any available text related to an entity. This enables a new class of powerful, high-capacity representations that can ultimately distill much of the useful information about an entity from multiple text sources, without any human supervision. We present several training strategies that, unlike prior approaches, learn to jointly predict words and entities -- strategies we compare experimentally on downstream tasks in the TV-Movies domain, such as MovieLens tag prediction from user reviews and natural language movie search. As evidenced by results, our models match or outperform competitive baselines, sometimes with little or no fine-tuning, and can scale to very large corpora. Finally, we make our datasets and pre-trained models publicly available. This includes Reviews2Movielens (see https://goo.gle/research-docent ), mapping the up to 1B word corpus of Amazon movie reviews (He and McAuley, 2016) to MovieLens tags (Harper and Konstan, 2016), as well as Reddit Movie Suggestions (see https://urikz.github.io/docent ) with natural language queries and corresponding community recommendations.

الحساب واللغة

A Model for Managing Collections of Patterns

145 - Baptiste Jeudy 2009

Data mining algorithms are now able to efficiently deal with huge amount of data. Various kinds of patterns may be discovered and may have some great impact on the general development of knowledge. In many domains, end users may want to have their da ta mined by data mining tools in order to extract patterns that could impact their business. Nevertheless, those users are often overwhelmed by the large quantity of patterns extracted in such a situation. Moreover, some privacy issues, or some commercial one may lead the users not to be able to mine the data by themselves. Thus, the users may not have the possibility to perform many experiments integrating various constraints in order to focus on specific patterns they would like to extract. Post processing of patterns may be an answer to that drawback. Thus, in this paper we present a framework that could allow end users to manage collections of patterns. We propose to use an efficient data structure on which some algebraic operators may be used in order to retrieve or access patterns in pattern bases.

الذكاء الاصطناعي

Unsupervised Document Embedding With CNNs

67 - Chundi Liu , Shunan Zhao , Maksims Volkovs 2017

We propose a new model for unsupervised document embedding. Leading existing approaches either require complex inference or use recurrent neural networks (RNN) that are difficult to parallelize. We take a different route and develop a convolutional n eural network (CNN) embedding model. Our CNN architecture is fully parallelizable resulting in over 10x speedup in inference time over RNN models. Parallelizable architecture enables to train deeper models where each successive layer has increasingly larger receptive field and models longer range semantic structure within the document. We additionally propose a fully unsupervised learning algorithm to train this model based on stochastic forward prediction. Empirical results on two public benchmarks show that our approach produces comparable to state-of-the-art accuracy at a fraction of computational cost.

الحساب واللغة التعلم الآلي التعلم الالي

Interactive Knowledge Base Population

383 - Travis Wolfe , Mark Dredze , James Mayfield 2015

Most work on building knowledge bases has focused on collecting entities and facts from as large a collection of documents as possible. We argue for and describe a new paradigm where the focus is on a high-recall extraction over a small collection of documents under the supervision of a human expert, that we call Interactive Knowledge Base Population (IKBP).

الذكاء الاصطناعي الحساب واللغة