Do Offline Metrics Predict Online Performance in Recommender Systems?

82 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Wenshuo Guo

تاريخ النشر 2020

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Karl Krauth - Sarah Dean - Alex Zhao

استرجاع المعلومات التعلم الآلي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Recommender systems operate in an inherently dynamical setting. Past recommendations influence future behavior, including which data points are observed and how user preferences change. However, experimenting in production systems with real user dynamics is often infeasible, and existing simulation-based approaches have limited scale. As a result, many state-of-the-art algorithms are designed to solve supervised learning problems, and progress is judged only by offline metrics. In this work we investigate the extent to which offline metrics predict online performance by evaluating eleven recommenders across six controlled simulated environments. We observe that offline metrics are correlated with online performance over a range of environments. However, improvements in offline metrics lead to diminishing returns in online performance. Furthermore, we observe that the ranking of recommenders varies depending on the amount of initial offline data available. We study the impact of adding exploration strategies, and observe that their effectiveness, when compared to greedy recommendation, is highly dependent on the recommendation algorithm. We provide the environments and recommenders described in this paper as Reclab: an extensible ready-to-use simulation framework at https://github.com/berkeley-reclab/RecLab.

قيم البحث

134 - Xiangyu Zhao , Changsheng Gu , Haoshenglun Zhang 2019

With the recent prevalence of Reinforcement Learning (RL), there have been tremendous interests in utilizing RL for online advertising in recommendation platforms (e.g., e-commerce and news feed sites). However, most RL-based advertising algorithms f ocus on optimizing ads revenue while ignoring the possible negative influence of ads on user experience of recommended items (products, articles and videos). Developing an optimal advertising algorithm in recommendations faces immense challenges because interpolating ads improperly or too frequently may decrease user experience, while interpolating fewer ads will reduce the advertising revenue. Thus, in this paper, we propose a novel advertising strategy for the rec/ads trade-off. To be specific, we develop an RL-based framework that can continuously update its advertising strategies and maximize reward in the long run. Given a recommendation list, we design a novel Deep Q-network architecture that can determine three internally related tasks jointly, i.e., (i) whether to interpolate an ad or not in the recommendation list, and if yes, (ii) the optimal ad and (iii) the optimal location to interpolate. The experimental results based on real-world data demonstrate the effectiveness of the proposed framework.

استرجاع المعلومات التعلم الآلي

A Methodology for the Offline Evaluation of Recommender Systems in a User Interface with Multiple Carousels

104 - Nicol`o Felicioni , Maurizio Ferrari Dacrema , Paolo Cremonesi 2021

Many video-on-demand and music streaming services provide the user with a page consisting of several recommendation lists, i.e. widgets or swipeable carousels, each built with a specific criterion (e.g. most recent, TV series, etc.). Finding efficien t strategies to select which carousels to display is an active research topic of great industrial interest. In this setting, the overall quality of the recommendations of a new algorithm cannot be assessed by measuring solely its individual recommendation quality. Rather, it should be evaluated in a context where other recommendation lists are already available, to account for how they complement each other. This is not considered by traditional offline evaluation protocols. Hence, we propose an offline evaluation protocol for a carousel setting in which the recommendation quality of a model is measured by how much it improves upon that of an already available set of carousels. We report experiments on publicly available datasets on the movie domain and notice that under a carousel setting the ranking of the algorithms change. In particular, when a SLIM carousel is available, matrix factorization models tend to be preferred, while item-based models are penalized. We also propose to extend ranking metrics to the two-dimensional carousel layout in order to account for a known position bias, i.e. users will not explore the lists sequentially, but rather concentrate on the top-left corner of the screen.

استرجاع المعلومات التعلم الآلي

Unbiased Learning to Rank: Online or Offline?

331 - Qingyao Ai , Tao Yang , Huazheng Wang 2020

How to obtain an unbiased ranking model by learning to rank with biased user feedback is an important research question for IR. Existing work on unbiased learning to rank (ULTR) can be broadly categorized into two groups -- the studies on unbiased le arning algorithms with logged data, namely the textit{offline} unbiased learning, and the studies on unbiased parameters estimation with real-time user interactions, namely the textit{online} learning to rank. While their definitions of textit{unbiasness} are different, these two types of ULTR algorithms share the same goal -- to find the best models that rank documents based on their intrinsic relevance or utility. However, most studies on offline and online unbiased learning to rank are carried in parallel without detailed comparisons on their background theories and empirical performance. In this paper, we formalize the task of unbiased learning to rank and show that existing algorithms for offline unbiased learning and online learning to rank are just the two sides of the same coin. We evaluate six state-of-the-art ULTR algorithms and find that most of them can be used in both offline settings and online environments with or without minor modifications. Further, we analyze how different offline and online learning paradigms would affect the theoretical foundation and empirical effectiveness of each algorithm on both synthetic and real search data. Our findings could provide important insights and guideline for choosing and deploying ULTR algorithms in practice.

استرجاع المعلومات التعلم الآلي

Do recommender systems function in the health domain: a system review

149 - Jia Su , Yi Guan , Yuge Li 2020

Recommender systems have fulfilled an important role in everyday life. Recommendations such as news by Google, videos by Netflix, goods by e-commerce providers, etc. have heavily changed everyones lifestyle. Health domains contain similar decision-ma king problems such as what to eat, how to exercise, and what is the proper medicine for a patient. Recently, studies focused on recommender systems to solve health problems have attracted attention. In this paper, we review aspects of health recommender systems including interests, methods, evaluation, future challenges and trend issues. We find that 1) health recommender systems have their own health concern limitations that cause them to focus on less-risky recommendations such as diet recommendation; 2) traditional recommender methods such as content-based and collaborative filtering methods can hardly handle health constraints, but knowledge-based methods function more than ever; 3) evaluating a health recommendation is more complicated than evaluating a commercial one because multiple dimensions in addition to accuracy should be considered. Recommender systems can function well in the health domain after the solution of several key problems. Our work is a systematic review of health recommender system studies, we show current conditions and future directions. It is believed that this review will help domain researchers and promote health recommender systems to the next step.

استرجاع المعلومات الحساب واللغة التعلم الآلي

Graph Neural Networks in Recommender Systems: A Survey

94 - Shiwen Wu , Fei Sun , Wentao Zhang 2020

Owing to the superiority of GNN in learning on graph data and its efficacy in capturing collaborative signals and sequential patterns, utilizing GNN techniques in recommender systems has gain increasing interests in academia and industry. In this sur vey, we provide a comprehensive review of the most recent works on GNN-based recommender systems. We proposed a classification scheme for organizing existing works. For each category, we briefly clarify the main issues, and detail the corresponding strategies adopted by the representative models. We also discuss the advantages and limitations of the existing strategies. Furthermore, we suggest several promising directions for future researches. We hope this survey can provide readers with a general understanding of the recent progress in this field, and shed some light on future developments.

استرجاع المعلومات التعلم الآلي