No Arabic abstract
It is common for video-on-demand and music streaming services to adopt a user interface composed of several recommendation lists, i.e. widgets or swipeable carousels, each generated according to a specific criterion or algorithm (e.g. most recent, top popular, recommended for you, editors choice, etc.). Selecting the appropriate combination of carousel has significant impact on user satisfaction. A crucial aspect of this user interface is that to measure the relevance a new carousel for the user it is not sufficient to account solely for its individual quality. Instead, it should be considered that other carousels will already be present in the interface. This is not considered by traditional evaluation protocols for recommenders systems, in which each carousel is evaluated in isolation, regardless of (i) which other carousels are displayed to the user and (ii) the relative position of the carousel with respect to other carousels. Hence, we propose a two-dimensional evaluation protocol for a carousel setting that will measure the quality of a recommendation carousel based on how much it improves upon the quality of an already available set of carousels. Our evaluation protocol takes into account also the position bias, i.e. users do not explore the carousels sequentially, but rather concentrate on the top-left corner of the screen. We report experiments on the movie domain and notice that under a carousel setting the definition of which criteria has to be preferred to generate a list of recommended items changes with respect to what is commonly understood.
Many video-on-demand and music streaming services provide the user with a page consisting of several recommendation lists, i.e. widgets or swipeable carousels, each built with a specific criterion (e.g. most recent, TV series, etc.). Finding efficient strategies to select which carousels to display is an active research topic of great industrial interest. In this setting, the overall quality of the recommendations of a new algorithm cannot be assessed by measuring solely its individual recommendation quality. Rather, it should be evaluated in a context where other recommendation lists are already available, to account for how they complement each other. This is not considered by traditional offline evaluation protocols. Hence, we propose an offline evaluation protocol for a carousel setting in which the recommendation quality of a model is measured by how much it improves upon that of an already available set of carousels. We report experiments on publicly available datasets on the movie domain and notice that under a carousel setting the ranking of the algorithms change. In particular, when a SLIM carousel is available, matrix factorization models tend to be preferred, while item-based models are penalized. We also propose to extend ranking metrics to the two-dimensional carousel layout in order to account for a known position bias, i.e. users will not explore the lists sequentially, but rather concentrate on the top-left corner of the screen.
Community based question answering services have arisen as a popular knowledge sharing pattern for netizens. With abundant interactions among users, individuals are capable of obtaining satisfactory information. However, it is not effective for users to attain answers within minutes. Users have to check the progress over time until the satisfying answers submitted. We address this problem as a user personalized satisfaction prediction task. Existing methods usually exploit manual feature selection. It is not desirable as it requires careful design and is labor intensive. In this paper, we settle this issue by developing a new multiple instance deep learning framework. Specifically, in our settings, each question follows a weakly supervised learning multiple instance learning assumption, where its obtained answers can be regarded as instance sets and we define the question resolved with at least one satisfactory answer. We thus design an efficient framework exploiting multiple instance learning property with deep learning to model the question answer pairs. Extensive experiments on large scale datasets from Stack Exchange demonstrate the feasibility of our proposed framework in predicting askers personalized satisfaction. Our framework can be extended to numerous applications such as UI satisfaction Prediction, multi armed bandit problem, expert finding and so on.
Visualization recommendation work has focused solely on scoring visualizations based on the underlying dataset and not the actual user and their past visualization feedback. These systems recommend the same visualizations for every user, despite that the underlying user interests, intent, and visualization preferences are likely to be fundamentally different, yet vitally important. In this work, we formally introduce the problem of personalized visualization recommendation and present a generic learning framework for solving it. In particular, we focus on recommending visualizations personalized for each individual user based on their past visualization interactions (e.g., viewed, clicked, manually created) along with the data from those visualizations. More importantly, the framework can learn from visualizations relevant to other users, even if the visualizations are generated from completely different datasets. Experiments demonstrate the effectiveness of the approach as it leads to higher quality visualization recommendations tailored to the specific user intent and preferences. To support research on this new problem, we release our user-centric visualization corpus consisting of 17.4k users exploring 94k datasets with 2.3 million attributes and 32k user-generated visualizations.
As a highly data-driven application, recommender systems could be affected by data bias, resulting in unfair results for different data groups, which could be a reason that affects the system performance. Therefore, it is important to identify and solve the unfairness issues in recommendation scenarios. In this paper, we address the unfairness problem in recommender systems from the user perspective. We group users into advantaged and disadvantaged groups according to their level of activity, and conduct experiments to show that current recommender systems will behave unfairly between two groups of users. Specifically, the advantaged users (active) who only account for a small proportion in data enjoy much higher recommendation quality than those disadvantaged users (inactive). Such bias can also affect the overall performance since the disadvantaged users are the majority. To solve this problem, we provide a re-ranking approach to mitigate this unfairness problem by adding constraints over evaluation metrics. The experiments we conducted on several real-world datasets with various recommendation algorithms show that our approach can not only improve group fairness of users in recommender systems, but also achieve better overall recommendation performance.
Visualization recommendation seeks to generate, score, and recommend to users useful visualizations automatically, and are fundamentally important for exploring and gaining insights into a new or existing dataset quickly. In this work, we propose the first end-to-end ML-based visualization recommendation system that takes as input a large corpus of datasets and visualizations, learns a model based on this data. Then, given a new unseen dataset from an arbitrary user, the model automatically generates visualizations for that new dataset, derive scores for the visualizations, and output a list of recommended visualizations to the user ordered by effectiveness. We also describe an evaluation framework to quantitatively evaluate visualization recommendation models learned from a large corpus of visualizations and datasets. Through quantitative experiments, a user study, and qualitative analysis, we show that our end-to-end ML-based system recommends more effective and useful visualizations compared to existing state-of-the-art rule-based systems. Finally, we observed a strong preference by the human experts in our user study towards the visualizations recommended by our ML-based system as opposed to the rule-based system (5.92 from a 7-point Likert scale compared to only 3.45).