Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Online optimal task offloading with one-bit feedback

58 0 0.0 ( 0 )

Download Cite

Added by Shangshu Zhao

Publication date 2018

fields Informatics Engineering Mathematical Statistics

and research's language is English

Authors Shangshu Zhao - Zhaowei Zhu - Fuqian Yang

Machine Learning Machine Learning

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Task offloading is an emerging technology in fog-enabled networks. It allows users to transmit tasks to neighbor fog nodes so as to utilize the computing resources of the networks. In this paper, we investigate a stochastic task offloading model and propose a multi-armed bandit framework to formulate this model. We consider the fact that different helper nodes prefer different kinds of tasks. Further, we assume each helper node just feeds back one-bit information to the task node to indicate the level of happiness. The key challenge of this problem lies in the exploration-exploitation tradeoff. We thus implement a UCB-type algorithm to maximize the long-term happiness metric. Numerical simulations are given in the end of the paper to corroborate our strategy.

rate research

Online Semi-Supervised Learning with Bandit Feedback

113 - Sohini Upadhyay , Mikhail Yurochkin , Mayank Agarwal 2020

We formulate a new problem at the intersectionof semi-supervised learning and contextual bandits,motivated by several applications including clini-cal trials and ad recommendations. We demonstratehow Graph Convolutional Network (GCN), a semi-supervised learning approach, can be adjusted tothe new problem formulation. We also propose avariant of the linear contextual bandit with semi-supervised missing rewards imputation. We thentake the best of both approaches to develop multi-GCN embedded contextual bandit. Our algorithmsare verified on several real world datasets.

Machine Learning Machine Learning

Online learning with feedback graphs and switching costs

385 - Anshuka Rangi , Massimo Franceschetti 2018

We study online learning when partial feedback information is provided following every action of the learning process, and the learner incurs switching costs for changing his actions. In this setting, the feedback information system can be represented by a graph, and previous works studied the expected regret of the learner in the case of a clique (Expert setup), or disconnected single loops (Multi-Armed Bandits (MAB)). This work provides a lower bound on the expected regret in the Partial Information (PI) setting, namely for general feedback graphs --excluding the clique. Additionally, it shows that all algorithms that are optimal without switching costs are necessarily sub-optimal in the presence of switching costs, which motivates the need to design new algorithms. We propose two new algorithms: Threshold Based EXP3 and EXP3. SC. For the two special cases of symmetric PI setting and MAB, the expected regret of both of these algorithms is order optimal in the duration of the learning process. Additionally, Threshold Based EXP3 is order optimal in the switching cost, whereas EXP3. SC is not. Finally, empirical evaluations show that Threshold Based EXP3 outperforms the previously proposed order-optimal algorithms EXP3 SET in the presence of switching costs, and Batch EXP3 in the MAB setting with switching costs.

Machine Learning Machine Learning

Online Markov Decision Processes with Aggregate Bandit Feedback

358 - Alon Cohen , Haim Kaplan , Tomer Koren 2021

We study a novel variant of online finite-horizon Markov Decision Processes with adversarially changing loss functions and initially unknown dynamics. In each episode, the learner suffers the loss accumulated along the trajectory realized by the policy chosen for the episode, and observes aggregate bandit feedback: the trajectory is revealed along with the cumulative loss suffered, rather than the individual losses encountered along the trajectory. Our main result is a computationally efficient algorithm with $O(sqrt{K})$ regret for this setting, where $K$ is the number of episodes. We establish this result via an efficient reduction to a novel bandit learning setting we call Distorted Linear Bandits (DLB), which is a variant of bandit linear optimization where actions chosen by the learner are adversarially distorted before they are committed. We then develop a computationally-efficient online algorithm for DLB for which we prove an $O(sqrt{T})$ regret bound, where $T$ is the number of time steps. Our algorithm is based on online mirror descent with a self-concordant barrier regularization that employs a novel increasing learning rate schedule.

Machine Learning Machine Learning

Opportunistic Collaborative Beamforming with One-Bit Feedback

471 - Man-On Pun , D. Richard Brown III , H. Vincent Poor 2008

An energy-efficient opportunistic collaborative beamformer with one-bit feedback is proposed for ad hoc sensor networks over Rayleigh fading channels. In contrast to conventional collaborative beamforming schemes in which each source node uses channel state information to correct its local carrier offset and channel phase, the proposed beamforming scheme opportunistically selects a subset of source nodes whose received signals combine in a quasi-coherent manner at the intended receiver. No local phase-precompensation is performed by the nodes in the opportunistic collaborative beamformer. As a result, each node requires only one-bit of feedback from the destination in order to determine if it should or shouldnt participate in the collaborative beamformer. Theoretical analysis shows that the received signal power obtained with the proposed beamforming scheme scales linearly with the number of available source nodes. Since the the optimal node selection rule requires an exhaustive search over all possible subsets of source nodes, two low-complexity selection algorithms are developed. Simulation results confirm the effectiveness of opportunistic collaborative beamforming with the low-complexity selection algorithms.

Information Theory Information Theory

Boosting One-Point Derivative-Free Online Optimization via Residual Feedback

103 - Yan Zhang , Yi Zhou , Kaiyi Ji 2020

Zeroth-order optimization (ZO) typically relies on two-point feedback to estimate the unknown gradient of the objective function. Nevertheless, two-point feedback can not be used for online optimization of time-varying objective functions, where only a single query of the function value is possible at each time step. In this work, we propose a new one-point feedback method for online optimization that estimates the objective function gradient using the residual between two feedback points at consecutive time instants. Moreover, we develop regret bounds for ZO with residual feedback for both convex and nonconvex online optimization problems. Specifically, for both deterministic and stochastic problems and for both Lipschitz and smooth objective functions, we show that using residual feedback can produce gradient estimates with much smaller variance compared to conventional one-point feedback methods. As a result, our regret bounds are much tighter compared to existing regret bounds for ZO with conventional one-point feedback, which suggests that ZO with residual feedback can better track the optimizer of online optimization problems. Additionally, our regret bounds rely on weaker assumptions than those used in conventional one-point feedback methods. Numerical experiments show that ZO with residual feedback significantly outperforms existing one-point feedback methods also in practice.

Machine Learning Optimization and Control

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Online optimal task offloading with one-bit feedback

Ask ChatGPT about the research

No Arabic abstract

Read More

suggested questions