Analysing user identity via time-sensitive semantic edit distance (t-SED): A case study of Russian trolls on Twitter


Abstract in English

In the digital era, individuals are increasingly profiled and grouped based on the traces they leave behind in online social networks such as Twitter and Facebook. In this paper, we develop and evaluate a novel text analysis approach for studying user identity and social roles by redefining identity as a sequence of timestamped items (e.g. tweet texts). We operationalise this idea by developing a novel text distance metric, the time-sensitive semantic edit distance (t-SED), which accounts for the temporal context across multiple traces. To evaluate this method we undertake a case study of Russian online-troll activity within US political discourse. The novel metric allows us to classify the social roles of trolls based on their traces, in this case tweets, into one of the predefined categories left-leaning, right-leaning, and news feed. We show the effectiveness of the t-SED metric to measure the similarities between tweets while accounting for the temporal context, and we use novel data visualisation techniques and qualitative analysis to uncover new empirical insights into Russian troll activity that have not been identified in previous work. Additionally, we highlight a connection with the field of Actor-Network Theory and the related hypotheses of Gabriel Tarde, and we discuss how social sequence analysis using t-SED may provide new avenues for tackling a longstanding problem in social theory: how to analyse society without separating reality into micro versus macro levels.

Download