Deep Online Learning with Stochastic Constraints

108 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Guy Uziel

تاريخ النشر 2019

مجال البحث الهندسة المعلوماتية الاحصاء الرياضي

والبحث باللغة English

تأليف Guy Uziel

التعلم الآلي التعلم الالي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Deep learning models are considered to be state-of-the-art in many offline machine learning tasks. However, many of the techniques developed are not suitable for online learning tasks. The problem of using deep learning models with sequential data becomes even harder when several loss functions need to be considered simultaneously, as in many real-world applications. In this paper, we, therefore, propose a novel online deep learning training procedure which can be used regardless of the neural networks architecture, aiming to deal with the multiple objectives case. We demonstrate and show the effectiveness of our algorithm on the Neyman-Pearson classification problem on several benchmark datasets.

قيم البحث

65 - Masrour Zoghi , Tomas Tunys , Mohammad Ghavamzadeh 2017

Online learning to rank is a core problem in information retrieval and machine learning. Many provably efficient algorithms have been recently proposed for this problem in specific click models. The click model is a model of how the user interacts wi th a list of documents. Though these results are significant, their impact on practice is limited, because all proposed algorithms are designed for specific click models and lack convergence guarantees in other models. In this work, we propose BatchRank, the first online learning to rank algorithm for a broad class of click models. The class encompasses two most fundamental click models, the cascade and position-based models. We derive a gap-dependent upper bound on the $T$-step regret of BatchRank and evaluate it on a range of web search queries. We observe that BatchRank outperforms ranked bandits and is more robust than CascadeKL-UCB, an existing algorithm for the cascade model.

التعلم الآلي التعلم الالي

Stochastic Bandits with Linear Constraints

250 - Aldo Pacchiano , Mohammad Ghavamzadeh , Peter Bartlett 2020

We study a constrained contextual linear bandit setting, where the goal of the agent is to produce a sequence of policies, whose expected cumulative reward over the course of $T$ rounds is maximum, and each has an expected cost below a certain thresh old $tau$. We propose an upper-confidence bound algorithm for this problem, called optimistic pessimistic linear bandit (OPLB), and prove an $widetilde{mathcal{O}}(frac{dsqrt{T}}{tau-c_0})$ bound on its $T$-round regret, where the denominator is the difference between the constraint threshold and the cost of a known feasible action. We further specialize our results to multi-armed bandits and propose a computationally efficient algorithm for this setting. We prove a regret bound of $widetilde{mathcal{O}}(frac{sqrt{KT}}{tau - c_0})$ for this algorithm in $K$-armed bandits, which is a $sqrt{K}$ improvement over the regret bound we obtain by simply casting multi-armed bandits as an instance of contextual linear bandits and using the regret bound of OPLB. We also prove a lower-bound for the problem studied in the paper and provide simulations to validate our theoretical results.

التعلم الآلي التعلم الالي

Online Antenna Tuning in Heterogeneous Cellular Networks with Deep Reinforcement Learning

59 - Eren Balevi , Jeffrey G. Andrews 2019

We aim to jointly optimize antenna tilt angle, and vertical and horizontal half-power beamwidths of the macrocells in a heterogeneous cellular network (HetNet). The interactions between the cells, most notably due to their coupled interference render this optimization prohibitively complex. Utilizing a single agent reinforcement learning (RL) algorithm for this optimization becomes quite suboptimum despite its scalability, whereas multi-agent RL algorithms yield better solutions at the expense of scalability. Hence, we propose a compromise algorithm between these two. Specifically, a multi-agent mean field RL algorithm is first utilized in the offline phase so as to transfer information as features for the second (online) phase single agent RL algorithm, which employs a deep neural network to learn users locations. This two-step approach is a practical solution for real deployments, which should automatically adapt to environmental changes in the network. Our results illustrate that the proposed algorithm approaches the performance of the multi-agent RL, which requires millions of trials, with hundreds of online trials, assuming relatively low environmental dynamics, and performs much better than a single agent RL. Furthermore, the proposed algorithm is compact and implementable, and empirically appears to provide a performance guarantee regardless of the amount of environmental dynamics.

التعلم الآلي التعلم الالي

Stochastic Variance Reduction for Deep Q-learning

406 - Wei-Ye Zhao , Xi-Ya Guan , Yang Liu 2019

Recent advances in deep reinforcement learning have achieved human-level performance on a variety of real-world applications. However, the current algorithms still suffer from poor gradient estimation with excessive variance, resulting in unstable tr aining and poor sample efficiency. In our paper, we proposed an innovative optimization strategy by utilizing stochastic variance reduced gradient (SVRG) techniques. With extensive experiments on Atari domain, our method outperforms the deep q-learning baselines on 18 out of 20 games.

التعلم الآلي التعلم الالي

Homogeneous Online Transfer Learning with Online Distribution Discrepancy Minimization

119 - Yuntao Du , Zhiwen Tan , Qian Chen 2019

Transfer learning has been demonstrated to be successful and essential in diverse applications, which transfers knowledge from related but different source domains to the target domain. Online transfer learning(OTL) is a more challenging problem wher e the target data arrive in an online manner. Most OTL methods combine source classifier and target classifier directly by assigning a weight to each classifier, and adjust the weights constantly. However, these methods pay little attention to reducing the distribution discrepancy between domains. In this paper, we propose a novel online transfer learning method which seeks to find a new feature representation, so that the marginal distribution and conditional distribution discrepancy can be online reduced simultaneously. We focus on online transfer learning with multiple source domains and use the Hedge strategy to leverage knowledge from source domains. We analyze the theoretical properties of the proposed algorithm and provide an upper mistake bound. Comprehensive experiments on two real-world datasets show that our method outperforms state-of-the-art methods by a large margin.

التعلم الآلي التعلم الالي