Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

My Fair Bandit: Distributed Learning of Max-Min Fairness with Multi-player Bandits

273 0 0.0 ( 0 )

Download Cite

Added by Ilai Bistritz

Publication date 2020

fields Informatics Engineering

and research's language is English

Authors Ilai Bistritz - Tavor Z. Baharav - Amir Leshem

Computer Science and Game Theory Multiagent Systems

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Consider N cooperative but non-communicating players where each plays one out of M arms for T turns. Players have different utilities for each arm, representable as an NxM matrix. These utilities are unknown to the players. In each turn players select an arm and receive a noisy observation of their utility for it. However, if any other players selected the same arm that turn, all colliding players will all receive zero utility due to the conflict. No other communication or coordination between the players is possible. Our goal is to design a distributed algorithm that learns the matching between players and arms that achieves max-min fairness while minimizing the regret. We present an algorithm and prove that it is regret optimal up to a $loglog T$ factor. This is the first max-min fairness multi-player bandit algorithm with (near) order optimal regret.

rate research

Restricted Max-Min Fair Allocation

167 - Siu-Wing Cheng , Yuchen Mao 2018

The restricted max-min fair allocation problem seeks an allocation of resources to players that maximizes the minimum total value obtained by any player. It is NP-hard to approximate the problem to a ratio less than 2. Comparing the current best algorithm for estimating the optimal value with the current best for constructing an allocation, there is quite a gap between the ratios that can be achieved in polynomial time: roughly 4 for estimation and roughly $6 + 2sqrt{10}$ for construction. We propose an algorithm that constructs an allocation with value within a factor of $6 + delta$ from the optimum for any constant $delta > 0$. The running time is polynomial in the input size for any constant $delta$ chosen.

Data Structures and Algorithms

Distributed Learning in Ad-Hoc Networks: A Multi-player Multi-armed Bandit Framework

174 - Sumit J. Darak , Manjesh K.Hanawal 2020

Next-generation networks are expected to be ultra-dense with a very high peak rate but relatively lower expected traffic per user. For such scenario, existing central controller based resource allocation may incur substantial signaling (control communications) leading to a negative effect on the quality of service (e.g. drop calls), energy and spectrum efficiency. To overcome this problem, cognitive ad-hoc networks (CAHN) that share spectrum with other networks are being envisioned. They allow some users to identify and communicate in `free slots thereby reducing signaling load and allowing the higher number of users per base stations (dense networks). Such networks open up many interesting challenges such as resource identification, coordination, dynamic and context-aware adaptation for which Machine Learning and Artificial Intelligence framework offers novel solutions. In this paper, we discuss state-of-the-art multi-armed multi-player bandit based distributed learning algorithms that allow users to adapt to the environment and coordinate with other players/users. We also discuss various open research problems for feasible realization of CAHN and interesting applications in other domains such as energy harvesting, Internet of Things, and Smart grids.

Networking and Internet Architecture Machine Learning Signal Processing

Cooperative and Stochastic Multi-Player Multi-Armed Bandit: Optimal Regret With Neither Communication Nor Collisions

74 - Sebastien Bubeck , Thomas Budzinski , Mark Sellke 2020

We consider the cooperative multi-player version of the stochastic multi-armed bandit problem. We study the regime where the players cannot communicate but have access to shared randomness. In prior work by the first two authors, a strategy for this regime was constructed for two players and three arms, with regret $tilde{O}(sqrt{T})$, and with no collisions at all between the players (with very high probability). In this paper we show that these properties (near-optimal regret and no collisions at all) are achievable for any number of players and arms. At a high level, the previous strategy heavily relied on a $2$-dimensional geometric intuition that was difficult to generalize in higher dimensions, while here we take a more combinatorial route to build the new strategy.

Machine Learning Multiagent Systems Machine Learning

Max-min Fairness in 802.11 Mesh Networks

449 - Douglas J. Leith , Qizhi Cao , Vijay G. Subramanian 2010

In this paper we build upon the recent observation that the 802.11 rate region is log-convex and, for the first time, characterise max-min fair rate allocations for a large class of 802.11 wireless mesh networks. By exploiting features of the 802.11e/n MAC, in particular TXOP packet bursting, we are able to use this characterisation to establish a straightforward, practically implementable approach for achieving max-min throughput fairness. We demonstrate that this approach can be readily extended to encompass time-based fairness in multi-rate 802.11 mesh networks.

Networking and Internet Architecture

Multi-Player Bandits Revisited

102 - Lilian Besson , Emilie Kaufmann (CRIStAL 2017

Multi-player Multi-Armed Bandits (MAB) have been extensively studied in the literature, motivated by applications to Cognitive Radio systems. Driven by such applications as well, we motivate the introduction of several levels of feedback for multi-player MAB algorithms. Most existing work assume that sensing information is available to the algorithm. Under this assumption, we improve the state-of-the-art lower bound for the regret of any decentralized algorithms and introduce two algorithms, RandTopM and MCTopM, that are shown to empirically outperform existing algorithms. Moreover, we provide strong theoretical guarantees for these algorithms, including a notion of asymptotic optimality in terms of the number of selections of bad arms. We then introduce a promising heuristic, called Selfish, that can operate without sensing information, which is crucial for emerging applications to Internet of Things networks. We investigate the empirical performance of this algorithm and provide some first theoretical elements for the understanding of its behavior.

Machine Learning Machine Learning

comments

Fetching comments

Yarmouk Private University

Additional details More universities

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

My Fair Bandit: Distributed Learning of Max-Min Fairness with Multi-player Bandits

Ask ChatGPT about the research

No Arabic abstract

Read More