مساحة جديدة

اشترك بالحزمة الذهبية واحصل على وصول غير محدود شمرا أكاديميا

تسجيل مستخدم جديد

Optimal Strategies for Graph-Structured Bandits

322 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Hassan Saber

تاريخ النشر 2020

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Hassan Saber

نظرية المعلومات نظرية المعلومات التعلم الالي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

We study a structured variant of the multi-armed bandit problem specified by a set of Bernoulli distributions $ u != !( u_{a,b})_{a in mathcal{A}, b in mathcal{B}}$ with means $(mu_{a,b})_{a in mathcal{A}, b in mathcal{B}}!in![0,1]^{mathcal{A}timesmathcal{B}}$ and by a given weight matrix $omega!=! (omega_{b,b})_{b,b in mathcal{B}}$, where $ mathcal{A}$ is a finite set of arms and $ mathcal{B} $ is a finite set of users. The weight matrix $omega$ is such that for any two users $b,b!in!mathcal{B}, text{max}_{ainmathcal{A}}|mu_{a,b} !-! mu_{a,b}| !leq! omega_{b,b} $. This formulation is flexible enough to capture various situations, from highly-structured scenarios ($omega!in!{0,1}^{mathcal{B}timesmathcal{B}}$) to fully unstructured setups ($omega!equiv! 1$).We consider two scenarios depending on whether the learner chooses only the actions to sample rewards from or both users and actions. We first derive problem-dependent lower bounds on the regret for this generic graph-structure that involves a structure dependent linear programming problem. Second, we adapt to this setting the Indexed Minimum Empirical Divergence (IMED) algorithm introduced by Honda and Takemura (2015), and introduce the IMED-GS$^star$ algorithm. Interestingly, IMED-GS$^star$ does not require computing the solution of the linear programming problem more than about $log(T)$ times after $T$ steps, while being provably asymptotically optimal. Also, unlike existing bandit strategies designed for other popular structures, IMED-GS$^star$ does not resort to an explicit forced exploration scheme and only makes use of local counts of empirical events. We finally provide numerical illustration of our results that confirm the performance of IMED-GS$^star$.

قيم البحث

258 - Meghana Bande , Akshayaa Magesh , Venugopal V. Veeravalli 2021

A stochastic multi-user multi-armed bandit framework is used to develop algorithms for uncoordinated spectrum access. In contrast to prior work, it is assumed that rewards can be non-zero even under collisions, thus allowing for the number of users t o be greater than the number of channels. The proposed algorithm consists of an estimation phase and an allocation phase. It is shown that if every user adopts the algorithm, the system wide regret is order-optimal of order $O(log T)$ over a time-horizon of duration $T$. The regret guarantees hold for both the cases where the number of users is greater than or less than the number of channels. The algorithm is extended to the dynamic case where the number of users in the system evolves over time, and is shown to lead to sub-linear regret.

نظرية المعلومات نظرية المعلومات التعلم الالي

Adversarial Linear Contextual Bandits with Graph-Structured Side Observations

95 - Lingda Wang , Bingcong Li , Huozhi Zhou 2020

This paper studies the adversarial graphical contextual bandits, a variant of adversarial multi-armed bandits that leverage two categories of the most common side information: emph{contexts} and emph{side observations}. In this setting, a learning ag ent repeatedly chooses from a set of $K$ actions after being presented with a $d$-dimensional context vector. The agent not only incurs and observes the loss of the chosen action, but also observes the losses of its neighboring actions in the observation structures, which are encoded as a series of feedback graphs. This setting models a variety of applications in social networks, where both contexts and graph-structured side observations are available. Two efficient algorithms are developed based on texttt{EXP3}. Under mild conditions, our analysis shows that for undirected feedback graphs the first algorithm, texttt{EXP3-LGC-U}, achieves the regret of order $mathcal{O}(sqrt{(K+alpha(G)d)Tlog{K}})$ over the time horizon $T$, where $alpha(G)$ is the average emph{independence number} of the feedback graphs. A slightly weaker result is presented for the directed graph setting as well. The second algorithm, texttt{EXP3-LGC-IX}, is developed for a special class of problems, for which the regret is reduced to $mathcal{O}(sqrt{alpha(G)dTlog{K}log(KT)})$ for both directed as well as undirected feedback graphs. Numerical tests corroborate the efficiency of proposed algorithms.

التعلم الآلي

Testing for Anomalies: Active Strategies and Non-asymptotic Analysis

58 - Dhruva Kartik , Ashutosh Nayyar , Urbashi Mitra 2020

The problem of verifying whether a multi-component system has anomalies or not is addressed. Each component can be probed over time in a data-driven manner to obtain noisy observations that indicate whether the selected component is anomalous or not. The aim is to minimize the probability of incorrectly declaring the system to be free of anomalies while ensuring that the probability of correctly declaring it to be safe is sufficiently large. This problem is modeled as an active hypothesis testing problem in the Neyman-Pearson setting. Component-selection and inference strategies are designed and analyzed in the non-asymptotic regime. For a specific class of homogeneous problems, stronger (with respect to prior work) non-asymptotic converse and achievability bounds are provided.

نظرية المعلومات نظرية المعلومات

Supervised Linear Regression for Graph Learning from Graph Signals

139 - Arun Venkitaraman , Hermina Petric Maretic , Saikat Chatterjee 2018

We propose a supervised learning approach for predicting an underlying graph from a set of graph signals. Our approach is based on linear regression. In the linear regression model, we predict edge-weights of a graph as the output, given a set of sig nal values on nodes of the graph as the input. We solve for the optimal regression coefficients using a relevant optimization problem that is convex and uses a graph-Laplacian based regularization. The regularization helps to promote a specific graph spectral profile of the graph signals. Simulation experiments demonstrate that our approach predicts well even in presence of outliers in input data.

نظرية المعلومات نظرية المعلومات

Provably Optimal Algorithms for Generalized Linear Contextual Bandits

78 - Lihong Li , Yu Lu , Dengyong Zhou 2017

Contextual bandits are widely used in Internet services from news recommendation to advertising, and to Web search. Generalized linear models (logistical regression in particular) have demonstrated stronger performance than linear models in many appl ications where rewards are binary. However, most theoretical analyses on contextual bandits so far are on linear bandits. In this work, we propose an upper confidence bound based algorithm for generalized linear contextual bandits, which achieves an $tilde{O}(sqrt{dT})$ regret over $T$ rounds with $d$ dimensional feature vectors. This regret matches the minimax lower bound, up to logarithmic terms, and improves on the best previous result by a $sqrt{d}$ factor, assuming the number of arms is fixed. A key component in our analysis is to establish a new, sharp finite-sample confidence bound for maximum-likelihood estimates in generalized linear models, which may be of independent interest. We also analyze a simpler upper confidence bound algorithm, which is useful in practice, and prove it to have optimal regret for certain cases.

التعلم الآلي الذكاء الاصطناعي التعلم الالي

سجل دخول لتتمكن من نشر تعليقات

التعليقات

جاري جلب التعليقات

سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها

جامعة الملك عبد العزيز

تفاصيل إضافية المزيد من الجامعات

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Optimal Strategies for Graph-Structured Bandits

اسأل ChatGPT حول البحث

ﻻ يوجد ملخص باللغة العربية

اقرأ أيضاً