Quantifying Mental Health from Social Media with Neural User Embeddings


Abstract in English

Mental illnesses adversely affect a significant proportion of the population worldwide. However, the methods traditionally used for estimating and characterizing the prevalence of mental health conditions are time-consuming and expensive. Consequently, best-available estimates concerning the prevalence of mental health conditions are often years out of date. Automated approaches to supplement these survey methods with broad, aggregated information derived from social media content provides a potential means for near real-time estimates at scale. These may, in turn, provide grist for supporting, evaluating and iteratively improving upon public health programs and interventions. We propose a novel model for automated mental health status quantification that incorporates user embeddings. This builds upon recent work exploring representation learning methods that induce embeddings by leveraging social media post histories. Such embeddings capture latent characteristics of individuals (e.g., political leanings) and encode a soft notion of homophily. In this paper, we investigate whether user embeddings learned from twitter post histories encode information that correlates with mental health statuses. To this end, we estimated user embeddings for a set of users known to be affected by depression and post-traumatic stress disorder (PTSD), and for a set of demographically matched `control users. We then evaluated these embeddings with respect to: (i) their ability to capture homophilic relations with respect to mental health status; and (ii) the performance of downstream mental health prediction models based on these features. Our experimental results demonstrate that the user embeddings capture similarities between users with respect to mental conditions, and are predictive of mental health.

Download