No Arabic abstract
Since the uptake of social media, researchers have mined online discussions to track the outbreak and evolution of specific diseases or chronic conditions such as influenza or depression. To broaden the set of diseases under study, we developed a Deep Learning tool for Natural Language Processing that extracts mentions of virtually any medical condition or disease from unstructured social media text. With that tool at hand, we processed Reddit and Twitter posts, analyzed the clusters of the two resulting co-occurrence networks of conditions, and discovered that they correspond to well-defined categories of medical conditions. This resulted in the creation of the first comprehensive taxonomy of medical conditions automatically derived from online discussions. We validated the structure of our taxonomy against the official International Statistical Classification of Diseases and Related Health Problems (ICD-11), finding matches of our clusters with 20 official categories, out of 22. Based on the mentions of our taxonomys sub-categories on Reddit posts geo-referenced in the U.S., we were then able to compute disease-specific health scores. As opposed to counts of disease mentions or counts with no knowledge of our taxonomys structure, we found that our disease-specific health scores are causally linked with the officially reported prevalence of 18 conditions.
Mental illnesses adversely affect a significant proportion of the population worldwide. However, the methods traditionally used for estimating and characterizing the prevalence of mental health conditions are time-consuming and expensive. Consequently, best-available estimates concerning the prevalence of mental health conditions are often years out of date. Automated approaches to supplement these survey methods with broad, aggregated information derived from social media content provides a potential means for near real-time estimates at scale. These may, in turn, provide grist for supporting, evaluating and iteratively improving upon public health programs and interventions. We propose a novel model for automated mental health status quantification that incorporates user embeddings. This builds upon recent work exploring representation learning methods that induce embeddings by leveraging social media post histories. Such embeddings capture latent characteristics of individuals (e.g., political leanings) and encode a soft notion of homophily. In this paper, we investigate whether user embeddings learned from twitter post histories encode information that correlates with mental health statuses. To this end, we estimated user embeddings for a set of users known to be affected by depression and post-traumatic stress disorder (PTSD), and for a set of demographically matched `control users. We then evaluated these embeddings with respect to: (i) their ability to capture homophilic relations with respect to mental health status; and (ii) the performance of downstream mental health prediction models based on these features. Our experimental results demonstrate that the user embeddings capture similarities between users with respect to mental conditions, and are predictive of mental health.
The unprecedented growth of Internet users has resulted in an abundance of unstructured information on social media including health forums, where patients request health-related information or opinions from other users. Previous studies have shown that online peer support has limited effectiveness without expert intervention. Therefore, a system capable of assessing the severity of health state from the patients social media posts can help health professionals (HP) in prioritizing the users post. In this study, we inspect the efficacy of different aspects of Natural Language Understanding (NLU) to identify the severity of the users health state in relation to two perspectives(tasks) (a) Medical Condition (i.e., Recover, Exist, Deteriorate, Other) and (b) Medication (i.e., Effective, Ineffective, Serious Adverse Effect, Other) in online health communities. We propose a multiview learning framework that models both the textual content as well as contextual-information to assess the severity of the users health state. Specifically, our model utilizes the NLU views such as sentiment, emotions, personality, and use of figurative language to extract the contextual information. The diverse NLU views demonstrate its effectiveness on both the tasks and as well as on the individual disease to assess a users health.
This paper describes a prototype system that integrates social media analysis into the European Flood Awareness System (EFAS). This integration allows the collection of social media data to be automatically triggered by flood risk warnings determined by a hydro-meteorological model. Then, we adopt a multi-lingual approach to find flood-related messages by employing two state-of-the-art methodologies: language-agnostic word embeddings and language-aligned word embeddings. Both approaches can be used to bootstrap a classifier of social media messages for a new language with little or no labeled data. Finally, we describe a method for selecting relevant and representative messages and displaying them back in the interface of EFAS.
We conduct a large-scale social media-based study of oral health during the COVID-19 pandemic based on tweets from 9,104 Twitter users across 26 states (with sufficient samples) in the United States for the period between November 12, 2020 and June 14, 2021. To better understand how discussions on different topics/oral diseases vary across the users, we acquire or infer demographic information of users and other characteristics based on retrieved information from user profiles. Women and younger adults (19-29) are more likely to talk about oral health problems. We use the LDA topic model to extract the major topics/oral diseases in tweets. Overall, 26.70% of the Twitter users talk about wisdom tooth pain/jaw hurt, 23.86% tweet about dental service/cavity, 18.97% discuss chipped tooth/tooth break, 16.23% talk about dental pain, and the rest are about tooth decay/gum bleeding. By conducting logistic regression, we find that discussions vary across user characteristics. More importantly, we find social disparities in oral health during the pandemic. Specifically, we find that health insurance coverage rate is the most significant predictor in logistic regression for topic prediction. People from counties with higher insurance coverage tend to tweet less about all topics of oral diseases. People from counties at a higher risk of COVID-19 talk more about tooth decay/gum bleeding and chipped tooth/tooth break. Older adults (50+), who are vulnerable to COVID-19, are more likely to discuss dental pain. To our best knowledge, this is the first large-scale social media-based study to analyze and understand oral health in America amid the COVID-19 pandemic. We hope the findings of our study through the lens of social media can provide insights for oral health practitioners and policy makers.
Despite the influence that image-based communication has on online discourse, the role played by images in disinformation is still not well understood. In this paper, we present the first large-scale study of fauxtography, analyzing the use of manipulated or misleading images in news discussion on online communities. First, we develop a computational pipeline geared to detect fauxtography, and identify over 61k instances of fauxtography discussed on Twitter, 4chan, and Reddit. Then, we study how posting fauxtography affects engagement of posts on social media, finding that posts containing it receive more interactions in the form of re-shares, likes, and comments. Finally, we show that fauxtography images are often turned into memes by Web communities. Our findings show that effective mitigation against disinformation need to take images into account, and highlight a number of challenges in dealing with image-based disinformation.