No Arabic abstract
Clickstreams on individual websites have been studied for decades to gain insights into user interests and to improve website experiences. This paper proposes and examines a novel sequence modeling approach for web clickstreams, that also considers multi-tab branching and backtracking actions across websites to capture the full action sequence of a user while browsing. All of this is done using machine learning on the client side to obtain a more comprehensive view and at the same time preserve privacy. We evaluate our formalism with a model trained on data collected in a user study with three different browsing tasks based on different human information seeking strategies from psychological literature. Our results show that the model can successfully distinguish between browsing behaviors and correctly predict future actions. A subsequent qualitative analysis identified five common web browsing patterns from our collected behavior data, which help to interpret the model. More generally, this illustrates the power of overparameterization in ML and offers a new way of modeling, reasoning with, and prediction of observable sequential human interaction behaviors.
Getting deeper insights into the online browsing behavior of Web users has been a major research topic since the advent of the WWW. It provides useful information to optimize website design, Web browser design, search engines offerings, and online advertisement. We argue that new technologies and new services continue to have significant effects on the way how people browse the Web. For example, listening to music clips on YouTube or to a radio station on Last.fm does not require users to sit in front of their computer. Social media and networking sites like Facebook or micro-blogging sites like Twitter have attracted new types of users that previously were less inclined to go online. These changes in how people browse the Web feature new characteristics which are not well understood so far. In this paper, we provide novel and unique insights by presenting first results of DOBBS, our long-term effort to create a comprehensive and representative dataset capturing online user behavior. We firstly investigate the concepts of parallel browsing and passive browsing, showing that browsing the Web is no longer a dedicated task for many users. Based on these results, we then analyze their impact on the calculation of a users dwell time -- i.e., the time the user spends on a webpage -- which has become an important metric to quantify the popularity of websites.
Nowadays, the development of Web applications supporting distributed user interfaces (DUI) is straightforward. However, it is still hard to find Web sites supporting this kind of user interaction. Although studies on this field have demonstrated that DUI would improve the user experience, users are not massively empowered to manage these kinds of interactions. In this setting, we propose to move the responsibility of distributing both the UI and user interaction, from the application (a Web application) to the client (the Web browser), giving also rise to inter-application interaction distribution. This paper presents a platform for client-side DUI, built on the foundations of Web augmentation and End User Development. The idea is to empower end users to apply an augmentation layer over existing Web applications, considering both frequent use and opportunistic DUI requirements. In this work, we present the architecture and a prototype tool supporting this approach and illustrate the incorporation of some DUI features through case studies.
The investigation of the browsing behavior of users provides useful information to optimize web site design, web browser design, search engines offerings, and online advertisement. This has been a topic of active research since the Web started and a large body of work exists. However, new online services as well as advances in Web and mobile technologies clearly changed the meaning behind browsing the Web and require a fresh look at the problem and research, specifically in respect to whether the used models are still appropriate. Platforms such as YouTube, Netflix or last.fm have started to replace the traditional media channels (cinema, television, radio) and media distribution formats (CD, DVD, Blu-ray). Social networks (e.g., Facebook) and platforms for browser games attracted whole new, particularly less tech-savvy audiences. Furthermore, advances in mobile technologies and devices made browsing on-the-move the norm and changed the user behavior as in the mobile case browsing is often being influenced by the users location and context in the physical world. Commonly used datasets, such as web server access logs or search engines transaction logs, are inherently not capable of capturing the browsing behavior of users in all these facets. DOBBS (DERI Online Behavior Study) is an effort to create such a dataset in a non-intrusive, completely anonymous and privacy-preserving way. To this end, DOBBS provides a browser add-on that users can install, which keeps track of their browsing behavior (e.g., how much time they spent on the Web, how long they stay on a website, how often they visit a website, how they use their browser, etc.). In this paper, we outline the motivation behind DOBBS, describe the add-on and captured data in detail, and present some first results to highlight the strengths of DOBBS.
Problem-Based Learning (PBL) is a popular approach to instruction that supports students to get hands-on training by solving problems. Question Pool websites (QPs) such as LeetCode, Code Chef, and Math Playground help PBL by supplying authentic, diverse, and contextualized questions to students. Nonetheless, empirical findings suggest that 40% to 80% of students registered in QPs drop out in less than two months. This research is the first attempt to understand and predict student dropouts from QPs via exploiting students engagement moods. Adopting a data-driven approach, we identify five different engagement moods for QP students, which are namely challenge-seeker, subject-seeker, interest-seeker, joy-seeker, and non-seeker. We find that students have collective preferences for answering questions in each engagement mood, and deviation from those preferences increases their probability of dropping out significantly. Last but not least, this paper contributes by introducing a new hybrid machine learning model (we call Dropout-Plus) for predicting student dropouts in QPs. The test results on a popular QP in China, with nearly 10K students, show that Dropout-Plus can exceed the rival algorithms dropout prediction performance in terms of accuracy, F1-measure, and AUC. We wrap up our work by giving some design suggestions to QP managers and online learning professionals to reduce their student dropouts.
The new digital revolution of big data is deeply changing our capability of understanding society and forecasting the outcome of many social and economic systems. Unfortunately, information can be very heterogeneous in the importance, relevance, and surprise it conveys, affecting severely the predictive power of semantic and statistical methods. Here we show that the aggregation of web users behavior can be elicited to overcome this problem in a hard to predict complex system, namely the financial market. Specifically, our in-sample analysis shows that the combined use of sentiment analysis of news and browsing activity of users of Yahoo! Finance greatly helps forecasting intra-day and daily price changes of a set of 100 highly capitalized US stocks traded in the period 2012-2013. Sentiment analysis or browsing activity when taken alone have very small or no predictive power. Conversely, when considering a news signal where in a given time interval we compute the average sentiment of the clicked news, weighted by the number of clicks, we show that for nearly 50% of the companies such signal Granger-causes hourly price returns. Our result indicates a wisdom-of-the-crowd effect that allows to exploit users activity to identify and weigh properly the relevant and surprising news, enhancing considerably the forecasting power of the news sentiment.