No Arabic abstract
Objectives To test the feasibility of using Twitter data to assess determinants of consumers health behavior towards Human papillomavirus (HPV) vaccination informed by the Integrated Behavior Model (IBM). Methods We used three Twitter datasets spanning from 2014 to 2018. We preprocessed and geocoded the tweets, and then built a rule-based model that classified each tweet into either promotional information or consumers discussions. We applied topic modeling to discover major themes, and subsequently explored the associations between the topics learned from consumers discussions and the responses of HPV-related questions in the Health Information National Trends Survey (HINTS). Results We collected 2,846,495 tweets and analyzed 335,681 geocoded tweets. Through topic modeling, we identified 122 high-quality topics. The most discussed consumer topic is cervical cancer screening; while in promotional tweets, the most popular topic is to increase awareness of HPV causes cancer. 87 out of the 122 topics are correlated between promotional information and consumers discussions. Guided by IBM, we examined the alignment between our Twitter findings and the results obtained from HINTS. 35 topics can be mapped to HINTS questions by keywords, 112 topics can be mapped to IBM constructs, and 45 topics have statistically significant correlations with HINTS responses in terms of geographic distributions. Conclusion Not only mining Twitter to assess consumers health behaviors can obtain results comparable to surveys but can yield additional insights via a theory-driven approach. Limitations exist, nevertheless, these encouraging results impel us to develop innovative ways of leveraging social media in the changing health communication landscape.
The outbreak of COVID-19 highlights the need for a more harmonized, less privacy-concerning, easily accessible approach to monitoring the human mobility that has been proved to be associated with the viral transmission. In this study, we analyzed 587 million tweets worldwide to see how global collaborative efforts in reducing human mobility are reflected from the user-generated information at the global, country, and the U.S. state scale. Considering the multifaceted nature of mobility, we propose two types of distance: the single-day distance and the cross-day distance. To quantify the responsiveness in certain geographical regions, we further propose a mobility-based responsive index (MRI) that captures the overall degree of mobility changes within a time window. The results suggest that mobility patterns obtained from Twitter data are amendable to quantitatively reflect the mobility dynamics. Globally, the proposed two distances had greatly deviated from their baselines after March 11, 2020, when WHO declared COVID-19 as a pandemic. The considerably less periodicity after the declaration suggests that the protection measures have obviously affected peoples travel routines. The country scale comparisons reveal the discrepancies in responsiveness, evidenced by the contrasting mobility patterns in different epidemic phases. We find that the triggers of mobility changes correspond well with the national announcements of mitigation measures. In the U.S., the influence of the COVID-19 pandemic on mobility is distinct. However, the impacts varied substantially among states. The strong mobility recovering momentum is further fueled by the Black Lives Matter protests, potentially fostering the second wave of infections in the U.S.
Cycles are fundamental to human health and behavior. However, modeling cycles in time series data is challenging because in most cases the cycles are not labeled or directly observed and need to be inferred from multidimensional measurements taken over time. Here, we present CyHMMs, a cyclic hidden Markov model method for detecting and modeling cycles in a collection of multidimensional heterogeneous time series data. In contrast to previous cycle modeling methods, CyHMMs deal with a number of challenges encountered in modeling real-world cycles: they can model multivariate data with discrete and continuous dimensions; they explicitly model and are robust to missing data; and they can share information across individuals to model variation both within and between individual time series. Experiments on synthetic and real-world health-tracking data demonstrate that CyHMMs infer cycle lengths more accurately than existing methods, with 58% lower error on simulated data and 63% lower error on real-world data compared to the best-performing baseline. CyHMMs can also perform functions which baselines cannot: they can model the progression of individual features/symptoms over the course of the cycle, identify the most variable features, and cluster individual time series into groups with distinct characteristics. Applying CyHMMs to two real-world health-tracking datasets -- of menstrual cycle symptoms and physical activity tracking data -- yields important insights including which symptoms to expect at each point during the cycle. We also find that people fall into several groups with distinct cycle patterns, and that these groups differ along dimensions not provided to the model. For example, by modeling missing data in the menstrual cycles dataset, we are able to discover a medically relevant group of birth control users even though information on birth control is not given to the model.
Social Media offer a vast amount of geo-located and time-stamped textual content directly generated by people. This information can be analysed to obtain insights about the general state of a large population of users and to address scientific questions from a diversity of disciplines. In this work, we estimate temporal patterns of mood variation through the use of emotionally loaded words contained in Twitter messages, possibly reflecting underlying circadian and seasonal rhythms in the mood of the users. We present a method for computing mood scores from text using affective word taxonomies, and apply it to millions of tweets collected in the United Kingdom during the seasons of summer and winter. Our analysis results in the detection of strong and statistically significant circadian patterns for all the investigated mood types. Seasonal variation does not seem to register any important divergence in the signals, but a periodic oscillation within a 24-hour period is identified for each mood type. The main common characteristic for all emotions is their mid-morning peak, however their mood score patterns differ in the evenings.
On social media platforms, like Twitter, users are often interested in gaining more influence and popularity by growing their set of followers, aka their audience. Several studies have described the properties of users on Twitter based on static snapshots of their follower network. Other studies have analyzed the general process of link formation. Here, rather than investigating the dynamics of this process itself, we study how the characteristics of the audience and follower links change as the audience of a user grows in size on the road to users popularity. To begin with, we find that the early followers tend to be more elite users than the late followers, i.e., they are more likely to have verified and expert accounts. Moreover, the early followers are significantly more similar to the person that they follow than the late followers. Namely, they are more likely to share time zone, language, and topics of interests with the followed user. To some extent, these phenomena are related with the growth of Twitter itself, wherein the early followers tend to be the early adopters of Twitter, while the late followers are late adopters. We isolate, however, the effect of the growth of audiences consisting of followers from the growth of Twitters user base itself. Finally, we measure the engagement of such audiences with the content of the followed user, by measuring the probability that an early or late follower becomes a retweeter.
How do complex social systems evolve in the modern world? This question lies at the heart of social physics, and network analysis has proven critical in providing answers to it. In recent years, network analysis has also been used to gain a quantitative understanding of law as a complex adaptive system, but most research has focused on legal documents of a single type, and there exists no unified framework for quantitative legal document analysis using network analytical tools. Against this background, we present a comprehensive framework for analyzing legal documents as multi-dimensional, dynamic document networks. We demonstrate the utility of this framework by applying it to an original dataset of statutes and regulations from two different countries, the United States and Germany, spanning more than twenty years (1998-2019). Our framework provides tools for assessing the size and connectivity of the legal system as viewed through the lens of specific document collections as well as for tracking the evolution of individual legal documents over time. Implementing the framework for our dataset, we find that at the federal level, the United States legal system is increasingly dominated by regulations, whereas the German legal system remains governed by statutes. This holds regardless of whether we measure the systems at the macro, the meso, or the micro level.