Linear Bandit Algorithms with Sublinear Time Complexity

87 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Shuo Yang

تاريخ النشر 2021

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Shuo Yang - Tongzheng Ren - Sanjay Shakkottai

التعلم الآلي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

We propose to accelerate existing linear bandit algorithms to achieve per-step time complexity sublinear in the number of arms $K$. The key to sublinear complexity is the realization that the arm selection in many linear bandit algorithms reduces to the maximum inner product search (MIPS) problem. Correspondingly, we propose an algorithm that approximately solves the MIPS problem for a sequence of adaptive queries yielding near-linear preprocessing time complexity and sublinear query time complexity. Using the proposed MIPS solver as a sub-routine, we present two bandit algorithms (one based on UCB, and the other based on TS) that achieve sublinear time complexity. We explicitly characterize the tradeoff between the per-step time complexity and regret, and show that our proposed algorithms can achieve $O(K^{1-alpha(T)})$ per-step complexity for some $alpha(T) > 0$ and $widetilde O(sqrt{T})$ regret, where $T$ is the time horizon. Further, we present the theoretical limit of the tradeoff, which provides a lower bound for the per-step time complexity. We also discuss other choices of approximate MIPS algorithms and other applications to linear bandit problems.

قيم البحث

82 - Yuko Kuroki , Liyuan Xu , Atsushi Miyauchi 2019

We study the problem of stochastic combinatorial pure exploration (CPE), where an agent sequentially pulls a set of single arms (a.k.a. a super arm) and tries to find the best super arm. Among a variety of problem settings of the CPE, we focus on the full-bandit setting, where we cannot observe the reward of each single arm, but only the sum of the rewards. Although we can regard the CPE with full-bandit feedback as a special case of pure exploration in linear bandits, an approach based on linear bandits is not computationally feasible since the number of super arms may be exponential. In this paper, we first propose a polynomial-time bandit algorithm for the CPE under general combinatorial constraints and provide an upper bound of the sample complexity. Second, we design an approximation algorithm for the 0-1 quadratic maximization problem, which arises in many bandit algorithms with confidence ellipsoids. Based on our approximation algorithm, we propose novel bandit algorithms for the top-k selection problem, and prove that our algorithms run in polynomial time. Finally, we conduct experiments on synthetic and real-world datasets, and confirm the validity of our theoretical analysis in terms of both the computation time and the sample complexity.

التعلم الآلي بنى وهياكل البيانات والخوارزميات التعلم الالي

Corralling Stochastic Bandit Algorithms

88 - Raman Arora , Teodor V. Marinov , Mehryar Mohri 2020

We study the problem of corralling stochastic bandit algorithms, that is combining multiple bandit algorithms designed for a stochastic environment, with the goal of devising a corralling algorithm that performs almost as well as the best base algori thm. We give two general algorithms for this setting, which we show benefit from favorable regret guarantees. We show that the regret of the corralling algorithms is no worse than that of the best algorithm containing the arm with the highest reward, and depends on the gap between the highest reward and other rewards.

التعلم الآلي التعلم الالي

Bandit Linear Control

202 - Asaf Cassel 2020

We consider the problem of controlling a known linear dynamical system under stochastic noise, adversarially chosen costs, and bandit feedback. Unlike the full feedback setting where the entire cost function is revealed after each decision, here only the cost incurred by the learner is observed. We present a new and efficient algorithm that, for strongly convex and smooth costs, obtains regret that grows with the square root of the time horizon $T$. We also give extensions of this result to general convex, possibly non-smooth costs, and to non-stochastic system noise. A key component of our algorithm is a new technique for addressing bandit optimization of loss functions with memory.

التعلم الآلي التعلم الالي

Sublinear-Time Algorithms for Compressive Phase Retrieval

180 - Yi Li , Vasileios Nakos 2017

In the compressive phase retrieval problem, or phaseless compressed sensing, or compressed sensing from intensity only measurements, the goal is to reconstruct a sparse or approximately $k$-sparse vector $x in mathbb{R}^n$ given access to $y= |Phi x| $, where $|v|$ denotes the vector obtained from taking the absolute value of $vinmathbb{R}^n$ coordinate-wise. In this paper we present sublinear-time algorithms for different variants of the compressive phase retrieval problem which are akin to the variants considered for the classical compressive sensing problem in theoretical computer science. Our algorithms use pure combinatorial techniques and near-optimal number of measurements.

بنى وهياكل البيانات والخوارزميات نظرية المعلومات نظرية المعلومات

Linear and Sublinear Time Spectral Density Estimation

96 - Vladimir Braverman , Aditya Krishnan , Christopher Musco 2021

We analyze the popular kernel polynomial method (KPM) for approximating the spectral density (eigenvalue distribution) of an $ntimes n$ Hermitian matrix $A$. We prove that a simple and practical variant of the KPM algorithm can approximate the spectr al density to $epsilon$ accuracy in the Wasserstein-1 distance with roughly $O({1}/{epsilon})$ matrix-vector multiplications with $A$. This yields a provable linear time result for the problem with better $epsilon$ dependence than prior work. The KPM variant we study is based on damped Chebyshev polynomial expansions. We show that it is stable, meaning that it can be combined with any approximate matrix-vector multiplication algorithm for $A$. As an application, we develop an $O(ncdot text{poly}(1/epsilon))$ time algorithm for computing the spectral density of any $ntimes n$ normalized graph adjacency or Laplacian matrix. This runtime is sublinear in the size of the matrix, and assumes sample access to the graph. Our approach leverages several tools from approximation theory, including Jacksons seminal work on approximation with positive kernels [Jackson, 1912], and stability properties of three-term recurrence relations for orthogonal polynomials.

بنى وهياكل البيانات والخوارزميات التحليل العددي التحليل العددي