An Automated Pipeline for Character and Relationship Extraction from Readers Literary Book Reviews on Goodreads.com

66 0 0.0 ( 0 )

Download Cite

Added by Shadi Shahsavari

Publication date 2020

fields Informatics Engineering

and research's language is English

Authors Shadi Shahsavari - Ehsan Ebrahimzadeh - Behnam Shahbazi

Computation and Language

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Reader reviews of literary fiction on social media, especially those in persistent, dedicated forums, create and are in turn driven by underlying narrative frameworks. In their comments about a novel, readers generally include only a subset of characters and their relationships, thus offering a limited perspective on that work. Yet in aggregate, these reviews capture an underlying narrative framework comprised of different actants (people, places, things), their roles, and interactions that we label the consensus narrative framework. We represent this framework in the form of an actant-relationship story graph. Extracting this graph is a challenging computational problem, which we pose as a latent graphical model estimation problem. Posts and reviews are viewed as samples of sub graphs/networks of the hidden narrative framework. Inspired by the qualitative narrative theory of Greimas, we formulate a graphical generative Machine Learning (ML) model where nodes represent actants, and multi-edges and self-loops among nodes capture context-specific relationships. We develop a pipeline of interlocking automated methods to extract key actants and their relationships, and apply it to thousands of reviews and comments posted on Goodreads.com. We manually derive the ground truth narrative framework from SparkNotes, and then use word embedding tools to compare relationships in ground truth networks with our extracted networks. We find that our automated methodology generates highly accurate consensus narrative frameworks: for our four target novels, with approximately 2900 reviews per novel, we report average coverage/recall of important relationships of > 80% and an average edge detection rate of >89%. These extracted narrative frameworks can generate insight into how people (or classes of people) read and how they recount what they have read to others.

rate research

Improving an Hybrid Literary Book Recommendation System through Author Ranking

443 - Paula Cristina Vaz , David Martins de Matos , Bruno Martins 2012

Literary reading is an important activity for individuals and choosing to read a book can be a long time commitment, making book choice an important task for book lovers and public library users. In this paper we present an hybrid recommendation system to help readers decide which book to read next. We study book and author recommendation in an hybrid recommendation setting and test our approach in the LitRec data set. Our hybrid book recommendation approach purposed combines two item-based collaborative filtering algorithms to predict books and authors that the user will like. Author predictions are expanded in to a book list that is subsequently aggregated with the former list generated through the initial collaborative recommender. Finally, the resulting book list is used to yield the top-n book recommendations. By means of various experiments, we demonstrate that author recommendation can improve overall book recommendation.

Information Retrieval Digital Libraries

On building an automated responding system for app reviews: What are the characteristics of reviews and their responses?

69 - Phong Minh Vu , Tam The Nguyen , Tung Thanh Nguyen 2019

Recent studies showed that the dialogs between app developers and app users on app stores are important to increase user satisfaction and apps overall ratings. However, the large volume of reviews and the limitation of resources discourage app developers from engaging with customers through this channel. One solution to this problem is to develop an Automated Responding System for developers to respond to app reviews in a manner that is most similar to a human response. Toward designing such system, we have conducted an empirical study of the characteristics of mobile apps reviews and their human-written responses. We found that an app reviews can have multiple fragments at sentence level with different topics and intentions. Similarly, a response also can be divided into multiple fragments with unique intentions to answer certain parts of their review (e.g., complaints, requests, or information seeking). We have also identified several characteristics of review (rating, topics, intentions, quantitative text feature) that can be used to rank review by their priority of need for response. In addition, we identified the degree of re-usability of past responses is based on their context (single app, apps of the same category, and their common features). Last but not least, a responses can be reused in another review if some parts of it can be replaced by a placeholder that is either a named-entity or a hyperlink. Based on those findings, we discuss the implications of developing an Automated Responding System to help mobile apps developers write the responses for users reviews more effectively.

Software Engineering Computers and Society

An automated pipeline for the discovery of conspiracy and conspiracy theory narrative frameworks: Bridgegate, Pizzagate and storytelling on the web

83 - Timothy R. Tangherlini , Shadi Shahsavari , Behnam Shahbazi 2020

Although a great deal of attention has been paid to how conspiracy theories circulate on social media and their factual counterpart conspiracies, there has been little computational work done on describing their narrative structures. We present an automated pipeline for the discovery and description of the generative narrative frameworks of conspiracy theories on social media, and actual conspiracies reported in the news media. We base this work on two separate repositories of posts and news articles describing the well-known conspiracy theory Pizzagate from 2016, and the New Jersey conspiracy Bridgegate from 2013. We formulate a graphical generative machine learning model where nodes represent actors/actants, and multi-edges and self-loops among nodes capture context-specific relationships. Posts and news items are viewed as samples of subgraphs of the hidden narrative network. The problem of reconstructing the underlying structure is posed as a latent model estimation problem. We automatically extract and aggregate the actants and their relationships from the posts and articles. We capture context specific actants and interactant relationships by developing a system of supernodes and subnodes. We use these to construct a network, which constitutes the underlying narrative framework. We show how the Pizzagate framework relies on the conspiracy theorists interpretation of hidden knowledge to link otherwise unlinked domains of human interaction, and hypothesize that this multi-domain focus is an important feature of conspiracy theories. While Pizzagate relies on the alignment of multiple domains, Bridgegate remains firmly rooted in the single domain of New Jersey politics. We hypothesize that the narrative framework of a conspiracy theory might stabilize quickly in contrast to the narrative framework of an actual one, which may develop more slowly as revelations come to light.

Computation and Language Social and Information Networks

An Automated Bolide Detection Pipeline for GOES GLM

98 - Jeffrey C. Smith , Robert L. Morris , Clemens Rumpf 2021

The Geostationary Lightning Mapper (GLM) instrument onboard the GOES 16 and 17 satellites has been shown to be capable of detecting bolides (bright meteors) in Earths atmosphere. Due to its large, continuous field of view and immediate public data availability, GLM provides a unique opportunity to detect a large variety of bolides, including those in the 0.1 to 3 m diameter range and complements current ground-based bolide detection systems, which are typically sensitive to smaller events. We present a machine learning-based bolide detection and light curve generation pipeline being developed at NASA Ames Research Center as part of NASAs Asteroid Threat Assessment Project (ATAP). The ultimate goal is to generate a large catalog of calibrated bolide lightcurves to provide an unprecedented data set which will be used to inform meteor entry models on how incoming bodies interact with the Earths atmosphere and to infer the pre-entry properties of the impacting bodies. The data set will also be useful for other asteroidal studies. This paper reports on the progress of the first part of this ultimate goal, namely, the automated bolide detection pipeline. Development of the training set, ML model training and iterative improvements in detection performance are presented. The pipeline runs in an automated fashion and bolide lightcurves along with other measured properties are promptly published on a NASA hosted publicly accessible website, https://neo-bolide.ndc.nasa.gov.

Earth and Planetary Astrophysics Instrumentation and Methods for Astrophysics

Playing by the Book: An Interactive Game Approach for Action Graph Extraction from Text

85 - Ronen Tamari , Hiroyuki Shindo , Dafna Shahaf 2018

Understanding procedural text requires tracking entities, actions and effects as the narrative unfolds. We focus on the challenging real-world problem of action-graph extraction from material science papers, where language is highly specialized and data annotation is expensive and scarce. We propose a novel approach, Text2Quest, where procedural text is interpreted as instructions for an interactive game. A learning agent completes the game by executing the procedure correctly in a text-based simulated lab environment. The framework can complement existing approaches and enables richer forms of learning compared to static texts. We discuss potential limitations and advantages of the approach, and release a prototype proof-of-concept, hoping to encourage research in this direction.

Machine Learning Computation and Language Machine Learning