ترغب بنشر مسار تعليمي؟ اضغط هنا

Linear Bandit Algorithms with Sublinear Time Complexity

87   0   0.0 ( 0 )
 نشر من قبل Shuo Yang
 تاريخ النشر 2021
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

We propose to accelerate existing linear bandit algorithms to achieve per-step time complexity sublinear in the number of arms $K$. The key to sublinear complexity is the realization that the arm selection in many linear bandit algorithms reduces to the maximum inner product search (MIPS) problem. Correspondingly, we propose an algorithm that approximately solves the MIPS problem for a sequence of adaptive queries yielding near-linear preprocessing time complexity and sublinear query time complexity. Using the proposed MIPS solver as a sub-routine, we present two bandit algorithms (one based on UCB, and the other based on TS) that achieve sublinear time complexity. We explicitly characterize the tradeoff between the per-step time complexity and regret, and show that our proposed algorithms can achieve $O(K^{1-alpha(T)})$ per-step complexity for some $alpha(T) > 0$ and $widetilde O(sqrt{T})$ regret, where $T$ is the time horizon. Further, we present the theoretical limit of the tradeoff, which provides a lower bound for the per-step time complexity. We also discuss other choices of approximate MIPS algorithms and other applications to linear bandit problems.

قيم البحث

اقرأ أيضاً

We study the problem of stochastic combinatorial pure exploration (CPE), where an agent sequentially pulls a set of single arms (a.k.a. a super arm) and tries to find the best super arm. Among a variety of problem settings of the CPE, we focus on the full-bandit setting, where we cannot observe the reward of each single arm, but only the sum of the rewards. Although we can regard the CPE with full-bandit feedback as a special case of pure exploration in linear bandits, an approach based on linear bandits is not computationally feasible since the number of super arms may be exponential. In this paper, we first propose a polynomial-time bandit algorithm for the CPE under general combinatorial constraints and provide an upper bound of the sample complexity. Second, we design an approximation algorithm for the 0-1 quadratic maximization problem, which arises in many bandit algorithms with confidence ellipsoids. Based on our approximation algorithm, we propose novel bandit algorithms for the top-k selection problem, and prove that our algorithms run in polynomial time. Finally, we conduct experiments on synthetic and real-world datasets, and confirm the validity of our theoretical analysis in terms of both the computation time and the sample complexity.
We study the problem of corralling stochastic bandit algorithms, that is combining multiple bandit algorithms designed for a stochastic environment, with the goal of devising a corralling algorithm that performs almost as well as the best base algori thm. We give two general algorithms for this setting, which we show benefit from favorable regret guarantees. We show that the regret of the corralling algorithms is no worse than that of the best algorithm containing the arm with the highest reward, and depends on the gap between the highest reward and other rewards.
202 - Asaf Cassel 2020
We consider the problem of controlling a known linear dynamical system under stochastic noise, adversarially chosen costs, and bandit feedback. Unlike the full feedback setting where the entire cost function is revealed after each decision, here only the cost incurred by the learner is observed. We present a new and efficient algorithm that, for strongly convex and smooth costs, obtains regret that grows with the square root of the time horizon $T$. We also give extensions of this result to general convex, possibly non-smooth costs, and to non-stochastic system noise. A key component of our algorithm is a new technique for addressing bandit optimization of loss functions with memory.
180 - Yi Li , Vasileios Nakos 2017
In the compressive phase retrieval problem, or phaseless compressed sensing, or compressed sensing from intensity only measurements, the goal is to reconstruct a sparse or approximately $k$-sparse vector $x in mathbb{R}^n$ given access to $y= |Phi x| $, where $|v|$ denotes the vector obtained from taking the absolute value of $vinmathbb{R}^n$ coordinate-wise. In this paper we present sublinear-time algorithms for different variants of the compressive phase retrieval problem which are akin to the variants considered for the classical compressive sensing problem in theoretical computer science. Our algorithms use pure combinatorial techniques and near-optimal number of measurements.
We analyze the popular kernel polynomial method (KPM) for approximating the spectral density (eigenvalue distribution) of an $ntimes n$ Hermitian matrix $A$. We prove that a simple and practical variant of the KPM algorithm can approximate the spectr al density to $epsilon$ accuracy in the Wasserstein-1 distance with roughly $O({1}/{epsilon})$ matrix-vector multiplications with $A$. This yields a provable linear time result for the problem with better $epsilon$ dependence than prior work. The KPM variant we study is based on damped Chebyshev polynomial expansions. We show that it is stable, meaning that it can be combined with any approximate matrix-vector multiplication algorithm for $A$. As an application, we develop an $O(ncdot text{poly}(1/epsilon))$ time algorithm for computing the spectral density of any $ntimes n$ normalized graph adjacency or Laplacian matrix. This runtime is sublinear in the size of the matrix, and assumes sample access to the graph. Our approach leverages several tools from approximation theory, including Jacksons seminal work on approximation with positive kernels [Jackson, 1912], and stability properties of three-term recurrence relations for orthogonal polynomials.

الأسئلة المقترحة

التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا