No Arabic abstract
Real-time tweets can provide useful information on evolving events and situations. Geotagged tweets are especially useful, as they indicate the location of origin and provide geographic context. However, only a small portion of tweets are geotagged, limiting their use for situational awareness. In this paper, we adapt, improve, and evaluate a state-of-the-art deep learning model for city-level geolocation prediction, and integrate it with a visual analytics system tailored for real-time situational awareness. We provide computational evaluations to demonstrate the superiority and utility of our geolocation prediction model within an interactive system.
Various domain users are increasingly leveraging real-time social media data to gain rapid situational awareness. However, due to the high noise in the deluge of data, effectively determining semantically relevant information can be difficult, further complicated by the changing definition of relevancy by each end user for different events. The majority of existing methods for short text relevance classification fail to incorporate users knowledge into the classification process. Existing methods that incorporate interactive user feedback focus on historical datasets. Therefore, classifiers cannot be interactively retrained for specific events or user-dependent needs in real-time. This limits real-time situational awareness, as streaming data that is incorrectly classified cannot be corrected immediately, permitting the possibility for important incoming data to be incorrectly classified as well. We present a novel interactive learning framework to improve the classification process in which the user iteratively corrects the relevancy of tweets in real-time to train the classification model on-the-fly for immediate predictive improvements. We computationally evaluate our classification model adapted to learn at interactive rates. Our results show that our approach outperforms state-of-the-art machine learning models. In addition, we integrate our framework with the extended Social Media Analytics and Reporting Toolkit (SMART) 2.0 system, allowing the use of our interactive learning framework within a visual analytics system tailored for real-time situational awareness. To demonstrate our frameworks effectiveness, we provide domain expert feedback from first responders who used the extended SMART 2.0 system.
Large-scale interaction networks of human communication are often modeled as complex graph structures, obscuring temporal patterns within individual conversations. To facilitate the understanding of such conversational dynamics, episodes with low or high communication activity as well as breaks in communication need to be detected to enable the identification of temporal interaction patterns. Traditional episode detection approaches are highly dependent on the choice of parameters, such as window-size or binning-resolution. In this paper, we present a novel technique for the identification of relevant episodes in bi-directional interaction sequences from abstract communication networks. We model communication as a continuous density function, allowing for a more robust segmentation into individual episodes and estimation of communication volume. Additionally, we define a tailored feature set to characterize conversational dynamics and enable a user-steered classification of communication behavior. We apply our technique to a real-world corpus of email data from a large European research institution. The results show that our technique allows users to effectively define, identify, and analyze relevant communication episodes.
A common network analysis task is comparison of two networks to identify unique characteristics in one network with respect to the other. For example, when comparing protein interaction networks derived from normal and cancer tissues, one essential task is to discover protein-protein interactions unique to cancer tissues. However, this task is challenging when the networks contain complex structural (and semantic) relations. To address this problem, we design ContraNA, a visual analytics framework leveraging both the power of machine learning for uncovering unique characteristics in networks and also the effectiveness of visualization for understanding such uniqueness. The basis of ContraNA is cNRL, which integrates two machine learning schemes, network representation learning (NRL) and contrastive learning (CL), to generate a low-dimensional embedding that reveals the uniqueness of one network when compared to another. ContraNA provides an interactive visualization interface to help analyze the uniqueness by relating embedding results and network structures as well as explaining the learned features by cNRL. We demonstrate the usefulness of ContraNA with two case studies using real-world datasets. We also evaluate through a controlled user study with 12 participants on network comparison tasks. The results show that participants were able to both effectively identify unique characteristics from complex networks and interpret the results obtained from cNRL.
Describing the dynamics of a city is a crucial step to both understanding the human activity in urban environments and to planning and designing cities accordingly. Here we describe the collective dynamics of New York City and surrounding areas as seen through the lens of Twitter usage. In particular, we observe and quantify the patterns that emerge naturally from the hourly activities in different areas of New York City, and discuss how they can be used to understand the urban areas. Using a dataset that includes more than 6 million geolocated Twitter messages we construct a movie of the geographic density of tweets. We observe the diurnal heartbeat of the NYC area. The largest scale dynamics are the waking and sleeping cycle and commuting from residential communities to office areas in Manhattan. Hourly dynamics reflect the interplay of commuting, work and leisure, including whether people are preoccupied with other activities or actively using Twitter. Differences between weekday and weekend dynamics point to changes in when people wake and sleep, and engage in social activities. We show that by measuring the average distances to the heart of the city one can quantify the weekly differences and the shift in behavior during weekends. We also identify locations and times of high Twitter activity that occur because of specific activities. These include early morning high levels of traffic as people arrive and wait at air transportation hubs, and on Sunday at the Meadowlands Sports Complex and Statue of Liberty. We analyze the role of particular individuals where they have large impacts on overall Twitter activity. Our analysis points to the opportunity to develop insight into both geographic social dynamics and attention through social media analysis.
When does Internet traffic cross international borders? This question has major geopolitical, legal and social implications and is surprisingly difficult to answer. A critical stumbling block is a dearth of tools that accurately map routers traversed by Internet traffic to the countries in which they are located. This paper presents Passport: a new approach for efficient, accurate country-level router geolocation and a system that implements it. Passport provides location predictions with limited active measurements, using machine learning to combine information from IP geolocation databases, router hostnames, whois records, and ping measurements. We show that Passport substantially outperforms existing techniques, and identify cases where paths traverse countries with implications for security, privacy, and performance.