No Arabic abstract
The deep and darkweb (d2web) refers to limited access web sites that require registration, authentication, or more complex encryption protocols to access them. These web sites serve as hubs for a variety of illicit activities: to trade drugs, stolen user credentials, hacking tools, and to coordinate attacks and manipulation campaigns. Despite its importance to cyber crime, the d2web has not been systematically investigated. In this paper, we study a large corpus of messages posted to 80 d2web forums over a period of more than a year. We identify topics of discussion using LDA and use a non-parametric HMM to model the evolution of topics across forums. Then, we examine the dynamic patterns of discussion and identify forums with similar patterns. We show that our approach surfaces hidden similarities across different forums and can help identify anomalous events in this rich, heterogeneous data.
Early analyses revealed that dark web marketplaces (DWMs) started offering COVID-19 related products (e.g., masks and COVID-19 tests) as soon as the current pandemic started, when these goods were in shortage in the traditional economy. Here, we broaden the scope and depth of previous investigations by analysing 194 DWMs until July 2021, including the crucial period in which vaccines became available, and by considering the wider impact of the pandemic on DWMs. First, we focus on vaccines. We find 250 listings offering approved vaccines, like Pfizer/BioNTech and AstraZeneca, as well as vendors offering fabricated proofs of vaccination and COVID-19 passports. Second, we consider COVID-19 related products. We reveal that, as the regular economy has become able to satisfy the demand of these goods, DWMs have decreased their offer. Third, we analyse the profile of vendors of COVID-19 related products and vaccines. We find that most of them are specialized in a single type of listings and are willing to ship worldwide. Finally, we consider a broader set of listings simply mentioning COVID-19. Among 10,330 such listings, we show that recreational drugs are the most affected among traditional DWMs product, with COVID-19 mentions steadily increasing since March 2020. We anticipate that our effort is of interest to researchers, practitioners, and law enforcement agencies focused on the study and safeguard of public health.
The COVID-19 pandemic has reshaped the demand for goods and services worldwide. The combination of a public health emergency, economic distress, and misinformation-driven panic have pushed customers and vendors towards the shadow economy. In particular, dark web marketplaces (DWMs), commercial websites accessible via free software, have gained significant popularity. Here, we analyse 851,199 listings extracted from 30 DWMs between January 1, 2020 and November 16, 2020. We identify 788 listings directly related to COVID-19 products and monitor the temporal evolution of product categories including Personal Protective Equipment (PPE), medicines (e.g., hydroxyclorochine), and medical frauds. Finally, we compare trends in their temporal evolution with variations in public attention, as measured by Twitter posts and Wikipedia page visits. We reveal how the online shadow economy has evolved during the COVID-19 pandemic and highlight the importance of a continuous monitoring of DWMs, especially now that real vaccines are available and in short supply. We anticipate our analysis will be of interest both to researchers and public agencies focused on the protection of public health.
Dark markets are commercial websites that use Bitcoin to sell or broker transactions involving drugs, weapons, and other illicit goods. Being illegal, they do not offer any user protection, and several police raids and scams have caused large losses to both customers and vendors over the past years. However, this uncertainty has not prevented a steady growth of the dark market phenomenon and a proliferation of new markets. The origin of this resilience have remained unclear so far, also due to the difficulty of identifying relevant Bitcoin transaction data. Here, we investigate how the dark market ecosystem re-organises following the disappearance of a market, due to factors including raids and scams. To do so, we analyse 24 episodes of unexpected market closure through a novel datasets of 133 million Bitcoin transactions involving 31 dark markets and their users, totalling 4 billion USD. We show that coordinated user migration from the closed market to coexisting markets guarantees overall systemic resilience beyond the intrinsic fragility of individual markets. The migration is swift, efficient and common to all market closures. We find that migrants are on average more active users in comparison to non-migrants and move preferentially towards the coexisting market with the highest trading volume. Our findings shed light on the resilience of the dark market ecosystem and we anticipate that they may inform future research on the self-organisation of emerging online markets.
Most current approaches to characterize and detect hate speech focus on textit{content} posted in Online Social Networks. They face shortcomings to collect and annotate hateful speech due to the incompleteness and noisiness of OSN text and the subjectivity of hate speech. These limitations are often aided with constraints that oversimplify the problem, such as considering only tweets containing hate-related words. In this work we partially address these issues by shifting the focus towards textit{users}. We develop and employ a robust methodology to collect and annotate hateful users which does not depend directly on lexicon and where the users are annotated given their entire profile. This results in a sample of Twitters retweet graph containing $100,386$ users, out of which $4,972$ were annotated. We also collect the users who were banned in the three months that followed the data collection. We show that hateful users differ from normal ones in terms of their activity patterns, word usage and as well as network structure. We obtain similar results comparing the neighbors of hateful vs. neighbors of normal users and also suspended users vs. active users, increasing the robustness of our analysis. We observe that hateful users are densely connected, and thus formulate the hate speech detection problem as a task of semi-supervised learning over a graph, exploiting the network of connections on Twitter. We find that a node embedding algorithm, which exploits the graph structure, outperforms content-based approaches for the detection of both hateful ($95%$ AUC vs $88%$ AUC) and suspended users ($93%$ AUC vs $88%$ AUC). Altogether, we present a user-centric view of hate speech, paving the way for better detection and understanding of this relevant and challenging issue.
Under increasing scrutiny, many web companies now offer bespoke mechanisms allowing any third party to file complaints (e.g., requesting the de-listing of a URL from a search engine). While this self-regulation might be a valuable web governance tool, it places huge responsibility within the hands of these organisations that demands close examination. We present the first large-scale study of web complaints (over 1 billion URLs). We find a range of complainants, largely focused on copyright enforcement. Whereas the majority of organisations are occasional users of the complaint system, we find a number of bulk senders specialised in targeting specific types of domain. We identify a series of trends and patterns amongst both the domains and complainants. By inspecting the availability of the domains, we also observe that a sizeable portion go offline shortly after complaints are generated. This paper sheds critical light on how complaints are issued, who they pertain to and which domains go offline after complaints are issued.