No Arabic abstract
Understanding how people interact with the web is key for a variety of applications, e.g., from the design of effective web pages to the definition of successful online marketing campaigns. Browsing behavior has been traditionally represented and studied by means of clickstreams, i.e., graphs whose vertices are web pages, and edges are the paths followed by users. Obtaining large and representative data to extract clickstreams is however challenging. The evolution of the web questions whether browsing behavior is changing and, by consequence, whether properties of clickstreams are changing. This paper presents a longitudinal study of clickstreams in from 2013 to 2016. We evaluate an anonymized dataset of HTTP traces captured in a large ISP, where thousands of households are connected. We first propose a methodology to identify actual URLs requested by users from the massive set of requests automatically fired by browsers when rendering web pages. Then, we characterize web usage patterns and clickstreams, taking into account both the temporal evolution and the impact of the device used to explore the web. Our analyses precisely quantify various aspects of clickstreams and uncover interesting patterns, such as the typical short paths followed by people while navigating the web, the fast increasing trend in browsing from mobile devices and the different roles of search engines and social networks in promoting content. Finally, we contribute a dataset of anonymized clickstreams to the community to foster new studies (anonymized clickstreams are available to the public at http://bigdata.polito.it/clickstream).
Nowadays, the development of Web applications supporting distributed user interfaces (DUI) is straightforward. However, it is still hard to find Web sites supporting this kind of user interaction. Although studies on this field have demonstrated that DUI would improve the user experience, users are not massively empowered to manage these kinds of interactions. In this setting, we propose to move the responsibility of distributing both the UI and user interaction, from the application (a Web application) to the client (the Web browser), giving also rise to inter-application interaction distribution. This paper presents a platform for client-side DUI, built on the foundations of Web augmentation and End User Development. The idea is to empower end users to apply an augmentation layer over existing Web applications, considering both frequent use and opportunistic DUI requirements. In this work, we present the architecture and a prototype tool supporting this approach and illustrate the incorporation of some DUI features through case studies.
Motives or goals are recognized in psychology literature as the most fundamental drive that explains and predicts why people do what they do, including when they browse the web. Although providing enormous value, these higher-ordered goals are often unobserved, and little is known about how to leverage such goals to assist peoples browsing activities. This paper proposes to take a new approach to address this problem, which is fulfilled through a novel neural framework, Goal-directed Web Browsing (GoWeB). We adopt a psychologically-sound taxonomy of higher-ordered goals and learn to build their representations in a structure-preserving manner. Then we incorporate the resulting representations for enhancing the experiences of common activities people perform on the web. Experiments on large-scale data from Microsoft Edge web browser show that GoWeB significantly outperforms competitive baselines for in-session web page recommendation, re-visitation classification, and goal-based web page grouping. A follow-up analysis further characterizes how the variety of human motives can affect the difference observed in human behavioral patterns.
Getting deeper insights into the online browsing behavior of Web users has been a major research topic since the advent of the WWW. It provides useful information to optimize website design, Web browser design, search engines offerings, and online advertisement. We argue that new technologies and new services continue to have significant effects on the way how people browse the Web. For example, listening to music clips on YouTube or to a radio station on Last.fm does not require users to sit in front of their computer. Social media and networking sites like Facebook or micro-blogging sites like Twitter have attracted new types of users that previously were less inclined to go online. These changes in how people browse the Web feature new characteristics which are not well understood so far. In this paper, we provide novel and unique insights by presenting first results of DOBBS, our long-term effort to create a comprehensive and representative dataset capturing online user behavior. We firstly investigate the concepts of parallel browsing and passive browsing, showing that browsing the Web is no longer a dedicated task for many users. Based on these results, we then analyze their impact on the calculation of a users dwell time -- i.e., the time the user spends on a webpage -- which has become an important metric to quantify the popularity of websites.
The deep and darkweb (d2web) refers to limited access web sites that require registration, authentication, or more complex encryption protocols to access them. These web sites serve as hubs for a variety of illicit activities: to trade drugs, stolen user credentials, hacking tools, and to coordinate attacks and manipulation campaigns. Despite its importance to cyber crime, the d2web has not been systematically investigated. In this paper, we study a large corpus of messages posted to 80 d2web forums over a period of more than a year. We identify topics of discussion using LDA and use a non-parametric HMM to model the evolution of topics across forums. Then, we examine the dynamic patterns of discussion and identify forums with similar patterns. We show that our approach surfaces hidden similarities across different forums and can help identify anomalous events in this rich, heterogeneous data.
Early analyses revealed that dark web marketplaces (DWMs) started offering COVID-19 related products (e.g., masks and COVID-19 tests) as soon as the current pandemic started, when these goods were in shortage in the traditional economy. Here, we broaden the scope and depth of previous investigations by analysing 194 DWMs until July 2021, including the crucial period in which vaccines became available, and by considering the wider impact of the pandemic on DWMs. First, we focus on vaccines. We find 250 listings offering approved vaccines, like Pfizer/BioNTech and AstraZeneca, as well as vendors offering fabricated proofs of vaccination and COVID-19 passports. Second, we consider COVID-19 related products. We reveal that, as the regular economy has become able to satisfy the demand of these goods, DWMs have decreased their offer. Third, we analyse the profile of vendors of COVID-19 related products and vaccines. We find that most of them are specialized in a single type of listings and are willing to ship worldwide. Finally, we consider a broader set of listings simply mentioning COVID-19. Among 10,330 such listings, we show that recreational drugs are the most affected among traditional DWMs product, with COVID-19 mentions steadily increasing since March 2020. We anticipate that our effort is of interest to researchers, practitioners, and law enforcement agencies focused on the study and safeguard of public health.