The Perils of Exploration under Competition: A Computational Modeling Approach

99 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Guy Aridor

تاريخ النشر 2019

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Guy Aridor - Kevin Liu - Aleksandrs Slivkins

علوم الكمبيوتر ونظرية الألعاب التعلم الآلي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

We empirically study the interplay between exploration and competition. Systems that learn from interactions with users often engage in exploration: making potentially suboptimal decisions in order to acquire new information for future decisions. However, when multiple systems are competing for the same market of users, exploration may hurt a systems reputation in the near term, with adverse competitive effects. In particular, a system may enter a death spiral, when the short-term reputation cost decreases the number of users for the system to learn from, which degrades its performance relative to competition and further decreases its market share. We ask whether better exploration algorithms are incentivized under competition. We run extensive numerical experiments in a stylized duopoly model in which two firms deploy multi-armed bandit algorithms and compete for myopic users. We find that duopoly and monopoly tend to favor a primitive greedy algorithm that does not explore and leads to low consumer welfare, whereas a temporary monopoly (a duopoly with an early entrant) may incentivize better bandit algorithms and lead to higher consumer welfare. Our findings shed light on the first-mover advantage in the digital economy by exploring the role that data can play as a barrier to entry in online markets.

قيم البحث

91 - Guy Aridor , Yishay Mansour , Aleksandrs Slivkins 2020

Most online platforms strive to learn from interactions with users, and many engage in exploration: making potentially suboptimal choices for the sake of acquiring new information. We study the interplay between exploration and competition: how such platforms balance the exploration for learning and the competition for users. Here users play three distinct roles: they are customers that generate revenue, they are sources of data for learning, and they are self-interested agents which choose among the competing platforms. We consider a stylized duopoly model in which two firms face the same multi-armed bandit problem. Users arrive one by one and choose between the two firms, so that each firm makes progress on its bandit problem only if it is chosen. Through a mix of theoretical results and numerical simulations, we study whether and to what extent competition incentivizes the adoption of better bandit algorithms, and whether it leads to welfare increases for users. We find that stark competition induces firms to commit to a greedy bandit algorithm that leads to low welfare. However, weakening competition by providing firms with some free users incentivizes better exploration strategies and increases welfare. We investigate two channels for weakening the competition: relaxing the rationality of users and giving one firm a first-mover advantage. Our findings are closely related to the competition vs. innovation relationship, and elucidate the first-mover advantage in the digital economy.

علوم الكمبيوتر ونظرية الألعاب التعلم الآلي الاقتصاد النظري

Competing Bandits: Learning under Competition

302 - Yishay Mansour , Aleksandrs Slivkins , Zhiwei Steven Wu 2017

Most modern systems strive to learn from interactions with users, and many engage in exploration: making potentially suboptimal choices for the sake of acquiring new information. We initiate a study of the interplay between exploration and competitio n--how such systems balance the exploration for learning and the competition for users. Here the users play three distinct roles: they are customers that generate revenue, they are sources of data for learning, and they are self-interested agents which choose among the competing systems. In our model, we consider competition between two multi-armed bandit algorithms faced with the same bandit instance. Users arrive one by one and choose among the two algorithms, so that each algorithm makes progress if and only if it is chosen. We ask whether and to what extent competition incentivizes the adoption of better bandit algorithms. We investigate this issue for several models of user response, as we vary the degree of rationality and competitiveness in the model. Our findings are closely related to the competition vs. innovation relationship, a well-studied theme in economics.

علوم الكمبيوتر ونظرية الألعاب التعلم الآلي

Exploration-Exploitation in Multi-Agent Competition: Convergence with Bounded Rationality

171 - Stefanos Leonardos , Georgios Piliouras , Kelly Spendlove 2021

The interplay between exploration and exploitation in competitive multi-agent learning is still far from being well understood. Motivated by this, we study smooth Q-learning, a prototypical learning model that explicitly captures the balance between game rewards and exploration costs. We show that Q-learning always converges to the unique quantal-response equilibrium (QRE), the standard solution concept for games under bounded rationality, in weighted zero-sum polymatrix games with heterogeneous learning agents using positive exploration rates. Complementing recent results about convergence in weighted potential games, we show that fast convergence of Q-learning in competitive settings is obtained regardless of the number of agents and without any need for parameter fine-tuning. As showcased by our experiments in network zero-sum games, these theoretical results provide the necessary guarantees for an algorithmic approach to the currently open problem of equilibrium selection in competitive multi-agent settings.

علوم الكمبيوتر ونظرية الألعاب التعلم الآلي أنظمة متعددة العملاء

A Near-Term Quantum Computing Approach for Hard Computational Problems in Space Exploration

125 - Vadim N. Smelyanskiy , Eleanor G. Rieffel , Sergey I. Knysh 2012

In this article, we show how to map a sampling of the hardest artificial intelligence problems in space exploration onto equivalent Ising models that then can be attacked using quantum annealing implemented in D-Wave machine. We overview the existing results as well as propose new Ising model implementations for quantum annealing. We review supervised and unsupervised learning algorithms for classification and clustering with applications to feature identification and anomaly detection. We introduce algorithms for data fusion and image matching for remote sensing applications. We overview planning problems for space exploration mission applications and algorithms for diagnostics and recovery with applications to deep space missions. We describe combinatorial optimization algorithms for task assignment in the context of autonomous unmanned exploration. Finally, we discuss the ways to circumvent the limitation of the Ising mapping using a blackbox approach based on ideas from probabilistic computing. In this article we describe the architecture of the D-Wave One machine and report its benchmarks. Results on random ensemble of problems in the range of up to 96 qubits show improved scaling for median core quantum annealing time compared with classical algorithms; whether this scaling persists for larger problem sizes is an open question. We also review previous results of D-Wave One benchmarking studies for solving binary classification problems with a quantum boosting algorithm which is shown to outperform AdaBoost. We review quantum algorithms for structured learning for multi-label classification and introduce a hybrid classical/quantum approach for learning the weights. Results of D-Wave One benchmarking studies for learning structured labels on four different data sets show a better performance compared with an independent Support Vector Machine approach with linear kernel.

فيزياء الكم

Selecting a Match: Exploration vs Decision

71 - Ishan Agarwal , Richard Cole , Yixin Tao 2021

In a dynamic matching market, such as a marriage or job market, how should agents balance accepting a proposed match with the cost of continuing their search? We consider this problem in a discrete setting, in which agents have cardinal values and fi nite lifetimes, and proposed matches are random. We seek to quantify how well the agents can do. We provide upper and lower bounds on the collective losses of the agents, with a polynomially small failure probability, where the notion of loss is with respect to a plausible baseline we define. These bounds are tight up to constant factors. We highlight two aspects of this work. First, in our model, agents have a finite time in which to enjoy their matches, namely the minimum of their remaining lifetime and that of their partner; this implies that unmatched agents become less desirable over time, and suggests that their decision rules should change over time. Second, we use a discrete rather than a continuum model for the population. The discreteness causes variance which induces localized imbalances in the two sides of the market. One of the main technical challenges we face is to bound these imbalances. In addition, we present the results of simulations on moderate-sized problems for both the discrete and continu

علوم الكمبيوتر ونظرية الألعاب