No Arabic abstract
A key challenge in mining social media data streams is to identify events which are actively discussed by a group of people in a specific local or global area. Such events are useful for early warning for accident, protest, election or breaking news. However, neither the list of events nor the resolution of both event time and space is fixed or known beforehand. In this work, we propose an online spatio-temporal event detection system using social media that is able to detect events at different time and space resolutions. First, to address the challenge related to the unknown spatial resolution of events, a quad-tree method is exploited in order to split the geographical space into multiscale regions based on the density of social media data. Then, a statistical unsupervised approach is performed that involves Poisson distribution and a smoothing method for highlighting regions with unexpected density of social posts. Further, event duration is precisely estimated by merging events happening in the same region at consecutive time intervals. A post processing stage is introduced to filter out events that are spam, fake or wrong. Finally, we incorporate simple semantics by using social media entities to assess the integrity, and accuracy of detected events. The proposed method is evaluated using different social media datasets: Twitter and Flickr for different cities: Melbourne, London, Paris and New York. To verify the effectiveness of the proposed method, we compare our results with two baseline algorithms based on fixed split of geographical space and clustering method. For performance evaluation, we manually compute recall and precision. We also propose a new quality measure named strength index, which automatically measures how accurate the reported event is.
Shaped by human movement, place connectivity is quantified by the strength of spatial interactions among locations. For decades, spatial scientists have researched place connectivity, applications, and metrics. The growing popularity of social media provides a new data stream where spatial social interaction measures are largely devoid of privacy issues, easily assessable, and harmonized. In this study, we introduced a global multi-scale place connectivity index (PCI) based on spatial interactions among places revealed by geotagged tweets as a spatiotemporal-continuous and easy-to-implement measurement. The multi-scale PCI, demonstrated at the US county level, exhibits a strong positive association with SafeGraph population movement records (10 percent penetration in the US population) and Facebooks social connectedness index (SCI), a popular connectivity index based on social networks. We found that PCI has a strong boundary effect and that it generally follows the distance decay, although this force is weaker in more urbanized counties with a denser population. Our investigation further suggests that PCI has great potential in addressing real-world problems that require place connectivity knowledge, exemplified with two applications: 1) modeling the spatial spread of COVID-19 during the early stage of the pandemic and 2) modeling hurricane evacuation destination choice. The methodological and contextual knowledge of PCI, together with the launched visualization platform and open-sourced PCI datasets at various geographic levels, are expected to support research fields requiring knowledge in human spatial interactions.
Events are happening in real-world and real-time, which can be planned and organized for occasions, such as social gatherings, festival celebrations, influential meetings or sports activities. Social media platforms generate a lot of real-time text information regarding public events with different topics. However, mining social events is challenging because events typically exhibit heterogeneous texture and metadata are often ambiguous. In this paper, we first design a novel event-based meta-schema to characterize the semantic relatedness of social events and then build an event-based heterogeneous information network (HIN) integrating information from external knowledge base. Second, we propose a novel Pairwise Popularity Graph Convolutional Network, named as PP-GCN, based on weighted meta-path instance similarity and textual semantic representation as inputs, to perform fine-grained social event categorization and learn the optimal weights of meta-paths in different tasks. Third, we propose a streaming social event detection and evolution discovery framework for HINs based on meta-path similarity search, historical information about meta-paths, and heterogeneous DBSCAN clustering method. Comprehensive experiments on real-world streaming social text data are conducted to compare various social event detection and evolution discovery algorithms. Experimental results demonstrate that our proposed framework outperforms other alternative social event detection and evolution discovery techniques.
It has been insufficiently explored how to perform density-based clustering by exploiting textual attributes on social media. In this paper, we aim at discovering a social point-of-interest (POI) boundary, formed as a convex polygon. More specifically, we present a new approach and algorithm, built upon our earlier work on social POI boundary estimation (SoBEst). This SoBEst approach takes into account both relevant and irrelevant records within a geographic area, where relevant records contain a POI name or its variations in their text field. Our study is motivated by the following empirical observation: a fixed representative coordinate of each POI that SoBEst basically assumes may be far away from the centroid of the estimated social POI boundary for certain POIs. Thus, using SoBEst in such cases may possibly result in unsatisfactory performance on the boundary estimation quality (BEQ), which is expressed as a function of the $F$-measure. To solve this problem, we formulate a joint optimization problem of simultaneously finding the radius of a circle and the POIs representative coordinate $c$ by allowing to update $c$. Subsequently, we design an iterative SoBEst (I-SoBEst) algorithm, which enables us to achieve a higher degree of BEQ for some POIs. The computational complexity of the proposed I-SoBEst algorithm is shown to scale linearly with the number of records. We demonstrate the superiority of our algorithm over competing clustering methods including the original SoBEst.
Deep generative models are increasingly used to gain insights in the geospatial data domain, e.g., for climate data. However, most existing approaches work with temporal snapshots or assume 1D time-series; few are able to capture spatio-temporal processes simultaneously. Beyond this, Earth-systems data often exhibit highly irregular and complex patterns, for example caused by extreme weather events. Because of climate change, these phenomena are only increasing in frequency. Here, we proposed a novel GAN-based approach for generating spatio-temporal weather patterns conditioned on detected extreme events. Our approach augments GAN generator and discriminator with an encoded extreme weather event segmentation mask. These segmentation masks can be created from raw input using existing event detection frameworks. As such, our approach is highly modular and can be combined with custom GAN architectures. We highlight the applicability of our proposed approach in experiments with real-world surface radiation and zonal wind data.
COVID-19 has caused lasting damage to almost every domain in public health, society, and economy. To monitor the pandemic trend, existing studies rely on the aggregation of traditional statistical models and epidemic spread theory. In other words, historical statistics of COVID-19, as well as the population mobility data, become the essential knowledge for monitoring the pandemic trend. However, these solutions can barely provide precise prediction and satisfactory explanations on the long-term disease surveillance while the ubiquitous social media resources can be the key enabler for solving this problem. For example, serious discussions may occur on social media before and after some breaking events take place. These events, such as marathon and parade, may impact the spread of the virus. To take advantage of the social media data, we propose a novel framework, Social Media enhAnced pandemic suRveillance Technique (SMART), which is composed of two modules: (i) information extraction module to construct heterogeneous knowledge graphs based on the extracted events and relationships among them; (ii) time series prediction module to provide both short-term and long-term forecasts of the confirmed cases and fatality at the state-level in the United States and to discover risk factors for COVID-19 interventions. Extensive experiments show that our method largely outperforms the state-of-the-art baselines by 7.3% and 7.4% in confirmed case/fatality prediction, respectively.