بحث متقدم مدعوم من الذكاء الصنعي

مساحة جديدة

اشترك بالحزمة الذهبية واحصل على وصول غير محدود شمرا أكاديميا

تسجيل مستخدم جديد

My Fair Bandit: Distributed Learning of Max-Min Fairness with Multi-player Bandits

273 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Ilai Bistritz

تاريخ النشر 2020

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Ilai Bistritz - Tavor Z. Baharav - Amir Leshem

علوم الكمبيوتر ونظرية الألعاب أنظمة متعددة العملاء

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Consider N cooperative but non-communicating players where each plays one out of M arms for T turns. Players have different utilities for each arm, representable as an NxM matrix. These utilities are unknown to the players. In each turn players select an arm and receive a noisy observation of their utility for it. However, if any other players selected the same arm that turn, all colliding players will all receive zero utility due to the conflict. No other communication or coordination between the players is possible. Our goal is to design a distributed algorithm that learns the matching between players and arms that achieves max-min fairness while minimizing the regret. We present an algorithm and prove that it is regret optimal up to a $loglog T$ factor. This is the first max-min fairness multi-player bandit algorithm with (near) order optimal regret.

قيم البحث

اقرأ أيضاً

Restricted Max-Min Fair Allocation

167 - Siu-Wing Cheng , Yuchen Mao 2018

The restricted max-min fair allocation problem seeks an allocation of resources to players that maximizes the minimum total value obtained by any player. It is NP-hard to approximate the problem to a ratio less than 2. Comparing the current best algo rithm for estimating the optimal value with the current best for constructing an allocation, there is quite a gap between the ratios that can be achieved in polynomial time: roughly 4 for estimation and roughly $6 + 2sqrt{10}$ for construction. We propose an algorithm that constructs an allocation with value within a factor of $6 + delta$ from the optimum for any constant $delta > 0$. The running time is polynomial in the input size for any constant $delta$ chosen.

بنى وهياكل البيانات والخوارزميات

Distributed Learning in Ad-Hoc Networks: A Multi-player Multi-armed Bandit Framework

174 - Sumit J. Darak , Manjesh K.Hanawal 2020

Next-generation networks are expected to be ultra-dense with a very high peak rate but relatively lower expected traffic per user. For such scenario, existing central controller based resource allocation may incur substantial signaling (control commu nications) leading to a negative effect on the quality of service (e.g. drop calls), energy and spectrum efficiency. To overcome this problem, cognitive ad-hoc networks (CAHN) that share spectrum with other networks are being envisioned. They allow some users to identify and communicate in `free slots thereby reducing signaling load and allowing the higher number of users per base stations (dense networks). Such networks open up many interesting challenges such as resource identification, coordination, dynamic and context-aware adaptation for which Machine Learning and Artificial Intelligence framework offers novel solutions. In this paper, we discuss state-of-the-art multi-armed multi-player bandit based distributed learning algorithms that allow users to adapt to the environment and coordinate with other players/users. We also discuss various open research problems for feasible realization of CAHN and interesting applications in other domains such as energy harvesting, Internet of Things, and Smart grids.

بنية الشبكات والإنترنت التعلم الآلي معالجة الإشارات

Cooperative and Stochastic Multi-Player Multi-Armed Bandit: Optimal Regret With Neither Communication Nor Collisions

74 - Sebastien Bubeck , Thomas Budzinski , Mark Sellke 2020

We consider the cooperative multi-player version of the stochastic multi-armed bandit problem. We study the regime where the players cannot communicate but have access to shared randomness. In prior work by the first two authors, a strategy for this regime was constructed for two players and three arms, with regret $tilde{O}(sqrt{T})$, and with no collisions at all between the players (with very high probability). In this paper we show that these properties (near-optimal regret and no collisions at all) are achievable for any number of players and arms. At a high level, the previous strategy heavily relied on a $2$-dimensional geometric intuition that was difficult to generalize in higher dimensions, while here we take a more combinatorial route to build the new strategy.

التعلم الآلي أنظمة متعددة العملاء التعلم الالي

Max-min Fairness in 802.11 Mesh Networks

431 - Douglas J. Leith , Qizhi Cao , Vijay G. Subramanian 2010

In this paper we build upon the recent observation that the 802.11 rate region is log-convex and, for the first time, characterise max-min fair rate allocations for a large class of 802.11 wireless mesh networks. By exploiting features of the 802.11e /n MAC, in particular TXOP packet bursting, we are able to use this characterisation to establish a straightforward, practically implementable approach for achieving max-min throughput fairness. We demonstrate that this approach can be readily extended to encompass time-based fairness in multi-rate 802.11 mesh networks.

بنية الشبكات والإنترنت

Multi-Player Bandits Revisited

102 - Lilian Besson , Emilie Kaufmann (CRIStAL 2017

Multi-player Multi-Armed Bandits (MAB) have been extensively studied in the literature, motivated by applications to Cognitive Radio systems. Driven by such applications as well, we motivate the introduction of several levels of feedback for multi-pl ayer MAB algorithms. Most existing work assume that sensing information is available to the algorithm. Under this assumption, we improve the state-of-the-art lower bound for the regret of any decentralized algorithms and introduce two algorithms, RandTopM and MCTopM, that are shown to empirically outperform existing algorithms. Moreover, we provide strong theoretical guarantees for these algorithms, including a notion of asymptotic optimality in terms of the number of selections of bad arms. We then introduce a promising heuristic, called Selfish, that can operate without sensing information, which is crucial for emerging applications to Internet of Things networks. We investigate the empirical performance of this algorithm and provide some first theoretical elements for the understanding of its behavior.

التعلم الالي التعلم الآلي

سجل دخول لتتمكن من نشر تعليقات

التعليقات

جاري جلب التعليقات

سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها

جامعة الشرق الأوسط - الأردن

تفاصيل إضافية المزيد من الجامعات

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

My Fair Bandit: Distributed Learning of Max-Min Fairness with Multi-player Bandits

اسأل ChatGPT حول البحث

ﻻ يوجد ملخص باللغة العربية

اقرأ أيضاً