No Arabic abstract
Various domain users are increasingly leveraging real-time social media data to gain rapid situational awareness. However, due to the high noise in the deluge of data, effectively determining semantically relevant information can be difficult, further complicated by the changing definition of relevancy by each end user for different events. The majority of existing methods for short text relevance classification fail to incorporate users knowledge into the classification process. Existing methods that incorporate interactive user feedback focus on historical datasets. Therefore, classifiers cannot be interactively retrained for specific events or user-dependent needs in real-time. This limits real-time situational awareness, as streaming data that is incorrectly classified cannot be corrected immediately, permitting the possibility for important incoming data to be incorrectly classified as well. We present a novel interactive learning framework to improve the classification process in which the user iteratively corrects the relevancy of tweets in real-time to train the classification model on-the-fly for immediate predictive improvements. We computationally evaluate our classification model adapted to learn at interactive rates. Our results show that our approach outperforms state-of-the-art machine learning models. In addition, we integrate our framework with the extended Social Media Analytics and Reporting Toolkit (SMART) 2.0 system, allowing the use of our interactive learning framework within a visual analytics system tailored for real-time situational awareness. To demonstrate our frameworks effectiveness, we provide domain expert feedback from first responders who used the extended SMART 2.0 system.
The first responder community has traditionally relied on calls from the public, officially-provided geographic information and maps for coordinating actions on the ground. The ubiquity of social media platforms created an opportunity for near real-time sensing of the situation (e.g. unfolding weather events or crises) through volunteered geographic information. In this article, we provide an overview of the design process and features of the Social Media Analytics Reporting Toolkit (SMART), a visual analytics platform developed at Purdue University for providing first responders with real-time situational awareness. We attribute its successful adoption by many first responders to its user-centered design, interactive (geo)visualizations and interactive machine learning, giving users control over analysis.
Since the start of COVID-19, several relevant corpora from various sources are presented in the literature that contain millions of data points. While these corpora are valuable in supporting many analyses on this specific pandemic, researchers require additional benchmark corpora that contain other epidemics to facilitate cross-epidemic pattern recognition and trend analysis tasks. During our other efforts on COVID-19 related work, we discover very little disease related corpora in the literature that are sizable and rich enough to support such cross-epidemic analysis tasks. In this paper, we present EPIC30M, a large-scale epidemic corpus that contains 30 millions micro-blog posts, i.e., tweets crawled from Twitter, from year 2006 to 2020. EPIC30M contains a subset of 26.2 millions tweets related to three general diseases, namely Ebola, Cholera and Swine Flu, and another subset of 4.7 millions tweets of six global epidemic outbreaks, including 2009 H1N1 Swine Flu, 2010 Haiti Cholera, 2012 Middle-East Respiratory Syndrome (MERS), 2013 West African Ebola, 2016 Yemen Cholera and 2018 Kivu Ebola. Furthermore, we explore and discuss the properties of the corpus with statistics of key terms and hashtags and trends analysis for each subset. Finally, we demonstrate the value and impact that EPIC30M could create through a discussion of multiple use cases of cross-epidemic research topics that attract growing interest in recent years. These use cases span multiple research areas, such as epidemiological modeling, pattern recognition, natural language understanding and economical modeling.
The number of missing people (i.e., people who get lost) greatly increases in recent years. It is a serious worldwide problem, and finding the missing people consumes a large amount of social resources. In tracking and finding these missing people, timely data gathering and analysis actually play an important role. With the development of social media, information about missing people can get propagated through the web very quickly, which provides a promising way to solve the problem. The information in online social media is usually of heterogeneous categories, involving both complex social interactions and textual data of diverse structures. Effective fusion of these different types of information for addressing the missing people identification problem can be a great challenge. Motivated by the multi-instance learning problem and existing social science theory of homophily, in this paper, we propose a novel r-instance (RI) learning model.
Real-time tweets can provide useful information on evolving events and situations. Geotagged tweets are especially useful, as they indicate the location of origin and provide geographic context. However, only a small portion of tweets are geotagged, limiting their use for situational awareness. In this paper, we adapt, improve, and evaluate a state-of-the-art deep learning model for city-level geolocation prediction, and integrate it with a visual analytics system tailored for real-time situational awareness. We provide computational evaluations to demonstrate the superiority and utility of our geolocation prediction model within an interactive system.
Radical right influencers routinely use social media to spread highly divisive, disruptive and anti-democratic messages. Assessing and countering the challenge that such content poses is crucial for ensuring that online spaces remain open, safe and accessible. Previous work has paid little attention to understanding factors associated with radical right content that goes viral. We investigate this issue with a new dataset ROT which provides insight into the content, engagement and followership of a set of 35 radical right influencers. It includes over 50,000 original entries and over 40 million retweets, quotes, replies and mentions. We use a multilevel model to measure engagement with tweets, which are nested in each influencer. We show that it is crucial to account for the influencer-level structure, and find evidence of the importance of both influencer- and content-level factors, including the number of followers each influencer has, the type of content (original posts, quotes and replies), the length and toxicity of content, and whether influencers request retweets. We make ROT available for other researchers to use.