No Arabic abstract
The patterns of life exhibited by large populations have been described and modeled both as a basic science exercise and for a range of applied goals such as reducing automotive congestion, improving disaster response, and even predicting the location of individuals. However, these studies previously had limited access to conversation content, rendering changes in expression as a function of movement invisible. In addition, they typically use the communication between a mobile phone and its nearest antenna tower to infer position, limiting the spatial resolution of the data to the geographical region serviced by each cellphone tower. We use a collection of 37 million geolocated tweets to characterize the movement patterns of 180,000 individuals, taking advantage of several orders of magnitude of increased spatial accuracy relative to previous work. Employing the recently developed sentiment analysis instrument known as the hedonometer, we characterize changes in word usage as a function of movement, and find that expressed happiness increases logarithmically with distance from an individuals average location.
Individual happiness is a fundamental societal metric. Normally measured through self-report, happiness has often been indirectly characterized and overshadowed by more readily quantifiable economic indicators such as gross domestic product. Here, we examine expressions made on the online, global microblog and social networking service Twitter, uncovering and explaining temporal variations in happiness and information levels over timescales ranging from hours to years. Our data set comprises over 46 billion words contained in nearly 4.6 billion expressions posted over a 33 month span by over 63 million unique users. In measuring happiness, we use a real-time, remote-sensing, non-invasive, text-based approach---a kind of hedonometer. In building our metric, made available with this paper, we conducted a survey to obtain happiness evaluations of over 10,000 individual words, representing a tenfold size improvement over similar existing word sets. Rather than being ad hoc, our word list is chosen solely by frequency of usage and we show how a highly robust metric can be constructed and defended.
The relationship between nature contact and mental well-being has received increasing attention in recent years. While a body of evidence has accumulated demonstrating a positive relationship between time in nature and mental well-being, there have been few studies comparing this relationship in different locations over long periods of time. In this study, we estimate a happiness benefit, the difference in expressed happiness between in- and out-of-park tweets, for the 25 largest cities in the US by population. People write happier words during park visits when compared with non-park user tweets collected around the same time. While the words people write are happier in parks on average and in most cities, we find considerable variation across cities. Tweets are happier in parks at all times of the day, week, and year, not just during the weekend or summer vacation. Across all cities, we find that the happiness benefit is highest in parks larger than 100 acres. Overall, our study suggests the happiness benefit associated with park visitation is on par with US holidays such as Thanksgiving and New Years Day.
The emergence of large stores of transactional data generated by increasing use of digital devices presents a huge opportunity for policymakers to improve their knowledge of the local environment and thus make more informed and better decisions. A research frontier is hence emerging which involves exploring the type of measures that can be drawn from data stores such as mobile phone logs, Internet searches and contributions to social media platforms, and the extent to which these measures are accurate reflections of the wider population. This paper contributes to this research frontier, by exploring the extent to which local commuting patterns can be estimated from data drawn from Twitter. It makes three contributions in particular. First, it shows that simple heuristics drawn from geolocated Twitter data offer a good proxy for local commuting patterns; one which outperforms the major existing method for estimating these patterns (the radiation model). Second, it investigates sources of error in the proxy measure, showing that the model performs better on short trips with higher volumes of commuters; it also looks at demographic biases but finds that, surprisingly, measurements are not significantly affected by the fact that the demographic makeup of Twitter users differs significantly from the population as a whole. Finally, it looks at potential ways of going beyond simple heuristics by incorporating temporal information into models.
The increasing availability of temporal network data is calling for more research on extracting and characterizing mesoscopic structures in temporal networks and on relating such structure to specific functions or properties of the system. An outstanding challenge is the extension of the results achieved for static networks to time-varying networks, where the topological structure of the system and the temporal activity patterns of its components are intertwined. Here we investigate the use of a latent factor decomposition technique, non-negative tensor factorization, to extract the community-activity structure of temporal networks. The method is intrinsically temporal and allows to simultaneously identify communities and to track their activity over time. We represent the time-varying adjacency matrix of a temporal network as a three-way tensor and approximate this tensor as a sum of terms that can be interpreted as communities of nodes with an associated activity time series. We summarize known computational techniques for tensor decomposition and discuss some quality metrics that can be used to tune the complexity of the factorized representation. We subsequently apply tensor factorization to a temporal network for which a ground truth is available for both the community structure and the temporal activity patterns. The data we use describe the social interactions of students in a school, the associations between students and school classes, and the spatio-temporal trajectories of students over time. We show that non-negative tensor factorization is capable of recovering the class structure with high accuracy. In particular, the extracted tensor components can be validated either as known school classes, or in terms of correlated activity patterns, i.e., of spatial and temporal coincidences that are determined by the known school activity schedule.
One can point to a variety of historical milestones for gender equality in STEM (science, technology, engineering, and mathematics), however, practical effects are incremental and ongoing. It is important to quantify gender differences in subdomains of scientific work in order to detect potential biases and monitor progress. In this work, we study the relevance of gender in scientific collaboration patterns in the Institute for Operations Research and the Management Sciences (INFORMS), a professional society with sixteen peer-reviewed journals. Using their publication data from 1952 to 2016, we constructed a large temporal bipartite network between authors and publications, and augmented the author nodes with gender labels. We characterized differences in several basic statistics of this network over time, highlighting how they have changed with respect to relevant historical events. We find a steady increase in participation by women (e.g., fraction of authorships by women and of new women authors) starting around 1980. However, women still comprise less than 25% of the INFORMS society and an even smaller fraction of authors with many publications. Moreover, we describe a methodology for quantifying the structural role of an authorship with respect to the overall connectivity of the network, using it to measure subtle differences between authorships by women and by men. Specifically, as measures of structural importance of an authorship, we use effective resistance and contraction importance, two measures related to diffusion throughout a network. As a null model, we propose a degree-preserving temporal and geometric network model with emergent communities. Our results suggest the presence of systematic differences between the collaboration patterns of men and women that cannot be explained by only local statistics.