ﻻ يوجد ملخص باللغة العربية
User representations are routinely used in recommendation systems by platform developers, targeted advertisements by marketers, and by public policy researchers to gauge public opinion across demographic groups. Computer scientists consider the problem of inferring user representations more abstractly; how does one extract a stable user representation - effective for many downstream tasks - from a medium as noisy and complicated as social media? The quality of a user representation is ultimately task-dependent (e.g. does it improve classifier performance, make more accurate recommendations in a recommendation system) but there are proxies that are less sensitive to the specific task. Is the representation predictive of latent properties such as a persons demographic features, socioeconomic class, or mental health state? Is it predictive of the users future behavior? In this thesis, we begin by showing how user representations can be learned from multiple types of user behavior on social media. We apply several extensions of generalized canonical correlation analysis to learn these representations and evaluate them at three tasks: predicting future hashtag mentions, friending behavior, and demographic features. We then show how user features can be employed as distant supervision to improve topic model fit. Finally, we show how user features can be integrated into and improve existing classifiers in the multitask learning framework. We treat user representations - ground truth gender and mental health features - as auxiliary tasks to improve mental health state prediction. We also use distributed user representations learned in the first chapter to improve tweet-level stance classifiers, showing that distant user information can inform classification tasks at the granularity of a single message.
There has been an explosion of multimodal content generated on social media networks in the last few years, which has necessitated a deeper understanding of social media content and user behavior. We present a novel content-independent content-user-r
Recent progress in AutoML has lead to state-of-the-art methods (e.g., AutoSKLearn) that can be readily used by non-experts to approach any supervised learning problem. Whereas these methods are quite effective, they are still limited in the sense tha
Social Media has seen a tremendous growth in the last decade and is continuing to grow at a rapid pace. With such adoption, it is increasingly becoming a rich source of data for opinion mining and sentiment analysis. The detection and analysis of sen
Twitter is currently a popular online social media platform which allows users to share their user-generated content. This publicly-generated user data is also crucial to healthcare technologies because the discovered patterns would hugely benefit th
Learning socially-aware motion representations is at the core of recent advances in multi-agent problems, such as human motion forecasting and robot navigation in crowds. Despite promising progress, existing representations learned with neural networ