Learning Under Delayed Feedback: Implicitly Adapting to Gradient Delays

62 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Rotem Zamir Aviv

تاريخ النشر 2021

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Rotem Zamir Aviv

التعلم الآلي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

We consider stochastic convex optimization problems, where several machines act asynchronously in parallel while sharing a common memory. We propose a robust training method for the constrained setting and derive non asymptotic convergence guarantees that do not depend on prior knowledge of update delays, objective smoothness, and gradient variance. Conversely, existing methods for this setting crucially rely on this prior knowledge, which render them unsuitable for essentially all shared-resources computational environments, such as clouds and data centers. Concretely, existing approaches are unable to accommodate changes in the delays which result from dynamic allocation of the machines, while our method implicitly adapts to such changes.

قيم البحث

289 - Raisa Dzhamtyrova , Carsten Maple 2020

The increasing connectivity of data and cyber-physical systems has resulted in a growing number of cyber-attacks. Real-time detection of such attacks, through the identification of anomalous activity, is required so that mitigation and contingent act ions can be effectively and rapidly deployed. We propose a new approach for aggregating unsupervised anomaly detection algorithms and incorporating feedback when it becomes available. We apply this approach to open-source real datasets and show that both aggregating models, which we call experts, and incorporating feedback significantly improve the performance. An important property of the proposed approaches is their theoretical guarantees that they perform close to the best superexpert, which can switch between the best performing experts, in terms of the cumulative average losses.

التعلم الالي التعلم الآلي

Learning Multiclass Classifier Under Noisy Bandit Feedback

164 - Mudit Agarwal , Naresh Manwani 2020

This paper addresses the problem of multiclass classification with corrupted or noisy bandit feedback. In this setting, the learner may not receive true feedback. Instead, it receives feedback that has been flipped with some non-zero probability. We propose a novel approach to deal with noisy bandit feedback based on the unbiased estimator technique. We further offer a method that can efficiently estimate the noise rates, thus providing an end-to-end framework. The proposed algorithm enjoys a mistake bound of the order of $O(sqrt{T})$ in the high noise case and of the order of $O(T^{ icefrac{2}{3}})$ in the worst case. We show our approachs effectiveness using extensive experiments on several benchmark datasets.

التعلم الآلي التعلم الالي

Learning to Communicate Implicitly By Actions

142 - Zheng Tian , Shihao Zou , Ian Davies 2018

In situations where explicit communication is limited, human collaborators act by learning to: (i) infer meaning behind their partners actions, and (ii) convey private information about the state to their partner implicitly through actions. The first component of this learning process has been well-studied in multi-agent systems, whereas the second --- which is equally crucial for successful collaboration --- has not. To mimic both components mentioned above, thereby completing the learning process, we introduce a novel algorithm: Policy Belief Learning (PBL). PBL uses a belief module to model the other agents private information and a policy module to form a distribution over actions informed by the belief module. Furthermore, to encourage communication by actions, we propose a novel auxiliary reward which incentivizes one agent to help its partner to make correct inferences about its private information. The auxiliary reward for communication is integrated into the learning of the policy module. We evaluate our approach on a set of environments including a matrix game, particle environment and the non-competitive bidding problem from contract bridge. We show empirically that this auxiliary reward is effective and easy to generalize. These results demonstrate that our PBL algorithm can produce strong pairs of agents in collaborative games where explicit communication is disabled.

الذكاء الاصطناعي

Stochastic Bandits with Delayed Composite Anonymous Feedback

72 - Siddhant Garg , Aditya Kumar Akash 2019

We explore a novel setting of the Multi-Armed Bandit (MAB) problem inspired from real world applications which we call bandits with stochastic delayed composite anonymous feedback (SDCAF). In SDCAF, the rewards on pulling arms are stochastic with res pect to time but spread over a fixed number of time steps in the future after pulling the arm. The complexity of this problem stems from the anonymous feedback to the player and the stochastic generation of the reward. Due to the aggregated nature of the rewards, the player is unable to associate the reward to a particular time step from the past. We present two algorithms for this more complicated setting of SDCAF using phase based extensions of the UCB algorithm. We perform regret analysis to show sub-linear theoretical guarantees on both the algorithms.

التعلم الآلي التعلم الالي

Learning to Run challenge solutions: Adapting reinforcement learning methods for neuromusculoskeletal environments

260 - {L}ukasz Kidzinski , Sharada Prasanna Mohanty , Carmichael Ong 2018

In the NIPS 2017 Learning to Run challenge, participants were tasked with building a controller for a musculoskeletal model to make it run as fast as possible through an obstacle course. Top participants were invited to describe their algorithms. In this work, we present eight solutions that used deep reinforcement learning approaches, based on algorithms such as Deep Deterministic Policy Gradient, Proximal Policy Optimization, and Trust Region Policy Optimization. Many solutions use similar relaxations and heuristics, such as reward shaping, frame skipping, discretization of the action space, symmetry, and policy blending. However, each of the eight teams implemented different modifications of the known algorithms.

التعلم الآلي الذكاء الاصطناعي التعلم الالي