No Arabic abstract
Cycles are fundamental to human health and behavior. However, modeling cycles in time series data is challenging because in most cases the cycles are not labeled or directly observed and need to be inferred from multidimensional measurements taken over time. Here, we present CyHMMs, a cyclic hidden Markov model method for detecting and modeling cycles in a collection of multidimensional heterogeneous time series data. In contrast to previous cycle modeling methods, CyHMMs deal with a number of challenges encountered in modeling real-world cycles: they can model multivariate data with discrete and continuous dimensions; they explicitly model and are robust to missing data; and they can share information across individuals to model variation both within and between individual time series. Experiments on synthetic and real-world health-tracking data demonstrate that CyHMMs infer cycle lengths more accurately than existing methods, with 58% lower error on simulated data and 63% lower error on real-world data compared to the best-performing baseline. CyHMMs can also perform functions which baselines cannot: they can model the progression of individual features/symptoms over the course of the cycle, identify the most variable features, and cluster individual time series into groups with distinct characteristics. Applying CyHMMs to two real-world health-tracking datasets -- of menstrual cycle symptoms and physical activity tracking data -- yields important insights including which symptoms to expect at each point during the cycle. We also find that people fall into several groups with distinct cycle patterns, and that these groups differ along dimensions not provided to the model. For example, by modeling missing data in the menstrual cycles dataset, we are able to discover a medically relevant group of birth control users even though information on birth control is not given to the model.
Discovering patterns and detecting anomalies in individual travel behavior is a crucial problem in both research and practice. In this paper, we address this problem by building a probabilistic framework to model individual spatiotemporal travel behavior data (e.g., trip records and trajectory data). We develop a two-dimensional latent Dirichlet allocation (LDA) model to characterize the generative mechanism of spatiotemporal trip records of each traveler. This model introduces two separate factor matrices for the spatial dimension and the temporal dimension, respectively, and use a two-dimensional core structure at the individual level to effectively model the joint interactions and complex dependencies. This model can efficiently summarize travel behavior patterns on both spatial and temporal dimensions from very sparse trip sequences in an unsupervised way. In this way, complex travel behavior can be modeled as a mixture of representative and interpretable spatiotemporal patterns. By applying the trained model on future/unseen spatiotemporal records of a traveler, we can detect her behavior anomalies by scoring those observations using perplexity. We demonstrate the effectiveness of the proposed modeling framework on a real-world license plate recognition (LPR) data set. The results confirm the advantage of statistical learning methods in modeling sparse individual travel behavior data. This type of pattern discovery and anomaly detection applications can provide useful insights for traffic monitoring, law enforcement, and individual travel behavior profiling.
Objectives To test the feasibility of using Twitter data to assess determinants of consumers health behavior towards Human papillomavirus (HPV) vaccination informed by the Integrated Behavior Model (IBM). Methods We used three Twitter datasets spanning from 2014 to 2018. We preprocessed and geocoded the tweets, and then built a rule-based model that classified each tweet into either promotional information or consumers discussions. We applied topic modeling to discover major themes, and subsequently explored the associations between the topics learned from consumers discussions and the responses of HPV-related questions in the Health Information National Trends Survey (HINTS). Results We collected 2,846,495 tweets and analyzed 335,681 geocoded tweets. Through topic modeling, we identified 122 high-quality topics. The most discussed consumer topic is cervical cancer screening; while in promotional tweets, the most popular topic is to increase awareness of HPV causes cancer. 87 out of the 122 topics are correlated between promotional information and consumers discussions. Guided by IBM, we examined the alignment between our Twitter findings and the results obtained from HINTS. 35 topics can be mapped to HINTS questions by keywords, 112 topics can be mapped to IBM constructs, and 45 topics have statistically significant correlations with HINTS responses in terms of geographic distributions. Conclusion Not only mining Twitter to assess consumers health behaviors can obtain results comparable to surveys but can yield additional insights via a theory-driven approach. Limitations exist, nevertheless, these encouraging results impel us to develop innovative ways of leveraging social media in the changing health communication landscape.
How can we model influence between individuals in a social system, even when the network of interactions is unknown? In this article, we review the literature on the influence model, which utilizes independent time series to estimate how much the state of one actor affects the state of another actor in the system. We extend this model to incorporate dynamical parameters that allow us to infer how influence changes over time, and we provide three examples of how this model can be applied to simulated and real data. The results show that the model can recover known estimates of influence, it generates results that are consistent with other measures of social networks, and it allows us to uncover important shifts in the way states may be transmitted between actors at different points in time.
The outbreak of COVID-19 highlights the need for a more harmonized, less privacy-concerning, easily accessible approach to monitoring the human mobility that has been proved to be associated with the viral transmission. In this study, we analyzed 587 million tweets worldwide to see how global collaborative efforts in reducing human mobility are reflected from the user-generated information at the global, country, and the U.S. state scale. Considering the multifaceted nature of mobility, we propose two types of distance: the single-day distance and the cross-day distance. To quantify the responsiveness in certain geographical regions, we further propose a mobility-based responsive index (MRI) that captures the overall degree of mobility changes within a time window. The results suggest that mobility patterns obtained from Twitter data are amendable to quantitatively reflect the mobility dynamics. Globally, the proposed two distances had greatly deviated from their baselines after March 11, 2020, when WHO declared COVID-19 as a pandemic. The considerably less periodicity after the declaration suggests that the protection measures have obviously affected peoples travel routines. The country scale comparisons reveal the discrepancies in responsiveness, evidenced by the contrasting mobility patterns in different epidemic phases. We find that the triggers of mobility changes correspond well with the national announcements of mitigation measures. In the U.S., the influence of the COVID-19 pandemic on mobility is distinct. However, the impacts varied substantially among states. The strong mobility recovering momentum is further fueled by the Black Lives Matter protests, potentially fostering the second wave of infections in the U.S.
We propose a mixed-methods approach to understanding the human infrastructure underlying StreetNet (SNET), a distributed, community-run intranet that serves as the primary Internet in Havana, Cuba. We bridge ethnographic studies and the study of social networks and organizations to understand the way that power is embedded in the structure of Havanas SNET. By quantitatively and qualitatively unpacking the human infrastructure of SNET, this work reveals how distributed infrastructure necessarily embeds the structural aspects of inequality distributed within that infrastructure. While traditional technical measurements of networks reflect the social, organizational, spatial, and technical constraints that shape the resulting network, ethnographies can help uncover the texture and role of these hidden supporting relationships. By merging these perspectives, this work contributes to our understanding of network roles in growing and maintaining distributed infrastructures, revealing new approaches to understanding larger, more complex Internet-human infrastructures---including the Internet and the WWW.