A One-Size-Fits-All Solution to Conservative Bandit Problems

274 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Yihan Du

تاريخ النشر 2020

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Yihan Du - Siwei Wang - Longbo Huang

التعلم الآلي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

In this paper, we study a family of conservative bandit problems (CBPs) with sample-path reward constraints, i.e., the learners reward performance must be at least as well as a given baseline at any time. We propose a One-Size-Fits-All solution to CBPs and present its applications to three encompassed problems, i.e. conservative multi-armed bandits (CMAB), conservative linear bandits (CLB) and conservative contextual combinatorial bandits (CCCB). Different from previous works which consider high probability constraints on the expected reward, we focus on a sample-path constraint on the actually received reward, and achieve better theoretical guarantees ($T$-independent additive regrets instead of $T$-dependent) and empirical performance. Furthermore, we extend the results and consider a novel conservative mean-variance bandit problem (MV-CBP), which measures the learning performance with both the expected reward and variability. For this extended problem, we provide a novel algorithm with $O(1/T)$ normalized additive regrets ($T$-independent in the cumulative form) and validate this result through empirical evaluation.

قيم البحث

125 - Ignacio Morer , Alessio Cardillo , Albert Diaz-Guilera 2018

Spatial networks are a powerful framework for studying a large variety of systems belonging to a broad diversity of contexts: from transportation to biology, from epidemiology to communications, and migrations, to cite a few. Spatial networks can be described in terms of their total cost (i.e. the total amount of resources needed for building or traveling their connections). Here, we address the issue of how to gauge and compare the quality of spatial network designs (i.e. efficiency vs. total cost) by proposing a two-step methodology. Firstly, we assess the networks design by introducing a quality function based on the concept of networks efficiency. Second, we propose an algorithm to estimate computationally the upper bound of our quality function for a given network. Complementarily, we provide a universal expression to obtain an approximated upper bound to any spatial network, regardless of its size. Smaller differences between the upper bound and the empirical value correspond to better designs. Finally, we test the applicability of this analytic tool-set on spatial network data-sets of different nature.

الفيزياء والمجتمع الشبكات الاجتماعية والمعلومات

Conservative Contextual Combinatorial Cascading Bandit

169 - Kun Wang , Canzhe Zhao , Shuai Li 2021

Conservative mechanism is a desirable property in decision-making problems which balance the tradeoff between the exploration and exploitation. We propose the novel emph{conservative contextual combinatorial cascading bandit ($C^4$-bandit)}, a cascad ing online learning game which incorporates the conservative mechanism. At each time step, the learning agent is given some contexts and has to recommend a list of items but not worse than the base strategy and then observes the reward by some stopping rules. We design the $C^4$-UCB algorithm to solve the problem and prove its n-step upper regret bound for two situations: known baseline reward and unknown baseline reward. The regret in both situations can be decomposed into two terms: (a) the upper bound for the general contextual combinatorial cascading bandit; and (b) a constant term for the regret from the conservative mechanism. We also improve the bound of the conservative contextual combinatorial bandit as a by-product. Experiments on synthetic data demonstrate its advantages and validate our theoretical analysis.

التعلم الآلي التعلم الالي

One Network Fits All? Modular versus Monolithic Task Formulations in Neural Networks

384 - Atish Agarwala , Abhimanyu Das , Brendan Juba 2021

Can deep learning solve multiple tasks simultaneously, even when they are unrelated and very different? We investigate how the representations of the underlying tasks affect the ability of a single neural network to learn them jointly. We present the oretical and empirical findings that a single neural network is capable of simultaneously learning multiple tasks from a combined data set, for a variety of methods for representing tasks -- for example, when the distinct tasks are encoded by well-separated clusters or decision trees over certain task-code attributes. More concretely, we present a novel analysis that shows that families of simple programming-like constructs for the codes encoding the tasks are learnable by two-layer neural networks with standard training. We study more generally how the complexity of learning such combined tasks grows with the complexity of the task codes; we find that combining many tasks may incur a sample complexity penalty, even though the individual tasks are easy to learn. We provide empirical support for the usefulness of the learning bounds by training networks on clusters, decision trees, and SQL-style aggregation.

التعلم الآلي الذكاء الاصطناعي التعلم الالي

No Size Fits All: Automated Radio Configuration for LPWANs

141 - Zerina Kapetanovic , Deepak Vasisht , Tusher Chakraborty 2021

Low power long-range networks like LoRa have become increasingly mainstream for Internet of Things deployments. Given the versatility of applications that these protocols enable, they support many data rates and bandwidths. Yet, for a given network t hat supports hundreds of devices over multiple miles, the network operator typically needs to specify the same configuration or among a small subset of configurations for all the client devices to communicate with the gateway. This one-size-fits-all approach is highly inefficient in large networks. We propose an alternative approach -- we allow network devices to transmit at any data rate they choose. The gateway uses the first few symbols in the preamble to classify the correct data rate, switches its configuration, and then decodes the data. Our design leverages the inherent asymmetry in outdoor IoT deployments where the clients are power-starved and resource-constrained, but the gateway is not. Our gateway design, Proteus, runs a neural network architecture and is backward compatible with existing LoRa protocols. Our experiments reveal that Proteus can identify the correct configuration with over 97% accuracy in both indoor and outdoor deployments. Our network architecture leads to a 3.8 to 11 times increase in throughput for our LoRa testbed.

بنية الشبكات والإنترنت التعلم الآلي

Efficient Online-Bandit Strategies for Minimax Learning Problems

89 - Christophe Roux , Elias Wirth , Sebastian Pokutta 2021

Several learning problems involve solving min-max problems, e.g., empirical distributional robust learning or learning with non-standard aggregated losses. More specifically, these problems are convex-linear problems where the minimization is carried out over the model parameters $winmathcal{W}$ and the maximization over the empirical distribution $pinmathcal{K}$ of the training set indexes, where $mathcal{K}$ is the simplex or a subset of it. To design efficient methods, we let an online learning algorithm play against a (combinatorial) bandit algorithm. We argue that the efficiency of such approaches critically depends on the structure of $mathcal{K}$ and propose two properties of $mathcal{K}$ that facilitate designing efficient algorithms. We focus on a specific family of sets $mathcal{S}_{n,k}$ encompassing various learning applications and provide high-probability convergence guarantees to the minimax values.

التعلم الآلي التحسين والتحكم التعلم الالي