No Arabic abstract
We consider multi-armed bandit problems in social groups wherein each individual has bounded memory and shares the common goal of learning the best arm/option. We say an individual learns the best option if eventually (as $tto infty$) it pulls only the arm with the highest expected reward. While this goal is provably impossible for an isolated individual due to bounded memory, we show that, in social groups, this goal can be achieved easily with the aid of social persuasion (i.e., communication) as long as the communication networks/graphs satisfy some mild conditions. To deal with the interplay between the randomness in the rewards and in the social interaction, we employ the {em mean-field approximation} method. Considering the possibility that the individuals in the networks may not be exchangeable when the communication networks are not cliques, we go beyond the classic mean-field techniques and apply a refined version of mean-field approximation: (1) Using coupling we show that, if the communication graph is connected and is either regular or has doubly-stochastic degree-weighted adjacency matrix, with probability $to 1$ as the social group size $N to infty$, every individual in the social group learns the best option. (2) If the minimum degree of the graph diverges as $N to infty$, over an arbitrary but given finite time horizon, the sample paths describing the opinion evolutions of the individuals are asymptotically independent. In addition, the proportions of the population with different opinions converge to the unique solution of a system of ODEs. In the solution of the obtained ODEs, the proportion of the population holding the correct opinion converges to $1$ exponentially fast in time. Notably, our results hold even if the communication graphs are highly sparse.
Naor, Parter, and Yogev [SODA 2020] recently designed a compiler for automatically translating standard centralized interactive protocols to distributed interactive protocols, as introduced by Kol, Oshman, and Saxena [PODC 2018]. In particular, by using this compiler, every linear-time algorithm for deciding the membership to some fixed graph class can be translated into a $mathsf{dMAM}(O(log n))$ protocol for this class, that is, a distributed interactive protocol with $O(log n)$-bit proof size in $n$-node graphs, and three interactions between the (centralizer) computationally-unbounded but non-trustable prover Merlin, and the (decentralized) randomized computationally-limited verifier Arthur. As a corollary, there is a $mathsf{dMAM}(O(log n))$ protocol for the class of planar graphs, as well as for the class of graphs with bounded genus. We show that there exists a distributed interactive protocol for the class of graphs with bounded genus performing just a single interaction, from the prover to the verifier, yet preserving proof size of $O(log n)$ bits. This result also holds for the class of graphs with bounded demi-genus, that is, graphs that can be embedded on a non-orientable surface of bounded genus. The interactive protocols described in this paper are actually proof-labeling schemes, i.e., a subclass of interactive protocols, previously introduced by Korman, Kutten, and Peleg [PODC 2005]. In particular, these schemes do not require any randomization from the verifier, and the proofs may often be computed a priori, at low cost, by the nodes themselves. Our results thus extend the recent proof-labeling scheme for planar graphs by Feuilloley et al. [PODC 2020], to graphs of bounded genus, and to graphs of bounded demigenus.
The Minimum Dominating Set (MDS) problem is not only one of the most fundamental problems in distributed computing, it is also one of the most challenging ones. While it is well-known that minimum dominating sets cannot be approximated locally on general graphs, over the last years, several breakthroughs have been made on computing local approximations on sparse graphs. This paper presents a deterministic and local constant factor approximation for minimum dominating sets on bounded genus graphs, a very large family of sparse graphs. Our main technical contribution is a new analysis of a slightly modified, first-order definable variant of an existing algorithm by Lenzen et al. Interestingly, unlike existing proofs for planar graphs, our analysis does not rely on any topological arguments. We believe that our techniques can be useful for the study of local problems on sparse graphs beyond the scope of this paper.
Identifying clusters of similar elements in a set is a common task in data analysis. With the immense growth of data and physical limitations on single processor speed, it is necessary to find efficient parallel algorithms for clustering tasks. In this paper, we study the problem of correlation clustering in bounded arboricity graphs with respect to the Massively Parallel Computation (MPC) model. More specifically, we are given a complete graph where the edges are either positive or negative, indicating whether pairs of vertices are similar or dissimilar. The task is to partition the vertices into clusters with as few disagreements as possible. That is, we want to minimize the number of positive inter-cluster edges and negative intra-cluster edges. Consider an input graph $G$ on $n$ vertices such that the positive edges induce a $lambda$-arboric graph. Our main result is a 3-approximation ($textit{in expectation}$) algorithm to correlation clustering that runs in $mathcal{O}(log lambda cdot textrm{poly}(log log n))$ MPC rounds in the $textit{strongly sublinear memory regime}$. This is obtained by combining structural properties of correlation clustering on bounded arboricity graphs with the insights of Fischer and Noever (SODA 18) on randomized greedy MIS and the $texttt{PIVOT}$ algorithm of Ailon, Charikar, and Newman (STOC 05). Combined with known graph matching algorithms, our structural property also implies an exact algorithm and algorithms with $textit{worst case}$ $(1+epsilon)$-approximation guarantees in the special case of forests, where $lambda=1$.
We introduce the lookahead-bounded Q-learning (LBQL) algorithm, a new, provably convergent variant of Q-learning that seeks to improve the performance of standard Q-learning in stochastic environments through the use of ``lookahead upper and lower bounds. To do this, LBQL employs previously collected experience and each iterations state-action values as dual feasible penalties to construct a sequence of sampled information relaxation problems. The solutions to these problems provide estimated upper and lower bounds on the optimal value, which we track via stochastic approximation. These quantities are then used to constrain the iterates to stay within the bounds at every iteration. Numerical experiments on benchmark problems show that LBQL exhibits faster convergence and more robustness to hyperparameters when compared to standard Q-learning and several related techniques. Our approach is particularly appealing in problems that require expensive simulations or real-world interactions.
We study the problem of finding large cuts in $d$-regular triangle-free graphs. In prior work, Shearer (1992) gives a randomised algorithm that finds a cut of expected size $(1/2 + 0.177/sqrt{d})m$, where $m$ is the number of edges. We give a simpler algorithm that does much better: it finds a cut of expected size $(1/2 + 0.28125/sqrt{d})m$. As a corollary, this shows that in any $d$-regular triangle-free graph there exists a cut of at least this size. Our algorithm can be interpreted as a very efficient randomised distributed algorithm: each node needs to produce only one random bit, and the algorithm runs in one synchronous communication round. This work is also a case study of applying computational techniques in the design of distributed algorithms: our algorithm was designed by a computer program that searched for optimal algorithms for small values of $d$.