Adversarial Delays in Online Strongly-Convex Optimization

189 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Daniel Khashabi Mr.

تاريخ النشر 2016

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Daniel Khashabi - Kent Quanrud - Amirhossein Taghvaei

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

We consider the problem of strongly-convex online optimization in presence of adversarial delays; in a T-iteration online game, the feedback of the players query at time t is arbitrarily delayed by an adversary for d_t rounds and delivered before the game ends, at iteration t+d_t-1. Specifically for algo{online-gradient-descent} algorithm we show it has a simple regret bound of Oh{sum_{t=1}^T log (1+ frac{d_t}{t})}. This gives a clear and simple bound without resorting any distributional and limiting assumptions on the delays. We further show how this result encompasses and generalizes several of the existing known results in the literature. Specifically it matches the celebrated logarithmic regret Oh{log T} when there are no delays (i.e. d_t = 1) and regret bound of Oh{tau log T} for constant delays d_t = tau.

قيم البحث

اقرأ أيضاً

Boosting for Online Convex Optimization

123 - Elad Hazan , Karan Singh 2021

We consider the decision-making framework of online convex optimization with a very large number of experts. This setting is ubiquitous in contextual and reinforcement learning problems, where the size of the policy class renders enumeration and sear ch within the policy class infeasible. Instead, we consider generalizing the methodology of online boosting. We define a weak learning algorithm as a mechanism that guarantees multiplicatively approximate regret against a base class of experts. In this access model, we give an efficient boosting algorithm that guarantees near-optimal regret against the convex hull of the base class. We consider both full and partial (a.k.a. bandit) information feedback models. We also give an analogous efficient boosting algorithm for the i.i.d. statistical setting. Our results simultaneously generalize online boosting and gradient boosting guarantees to contextual learning model, online convex optimization and bandit linear optimization settings.

التعلم الآلي التعلم الالي

Adaptive Bound Optimization for Online Convex Optimization

555 - H. Brendan McMahan , Matthew Streeter 2010

We introduce a new online convex optimization algorithm that adaptively chooses its regularization function based on the loss functions observed so far. This is in contrast to previous algorithms that use a fixed regularization function such as L2-sq uared, and modify it only via a single time-dependent parameter. Our algorithms regret bounds are worst-case optimal, and for certain realistic classes of loss functions they are much better than existing bounds. These bounds are problem-dependent, which means they can exploit the structure of the actual problem instance. Critically, however, our algorithm does not need to know this structure in advance. Rather, we prove competitive guarantees that show the algorithm provides a bound within a constant factor of the best possible bound (of a certain functional form) in hindsight.

التعلم الآلي

Robust Online Convex Optimization in the Presence of Outliers

69 - Tim van Erven , Sarah Sachs , Wouter M. Koolen 2021

We consider online convex optimization when a number k of data points are outliers that may be corrupted. We model this by introducing the notion of robust regret, which measures the regret only on rounds that are not outliers. The aim for the learne r is to achieve small robust regret, without knowing where the outliers are. If the outliers are chosen adversarially, we show that a simple filtering strategy on extreme gradients incurs O(k) additive overhead compared to the usual regret bounds, and that this is unimprovable, which means that k needs to be sublinear in the number of rounds. We further ask which additional assumptions would allow for a linear number of outliers. It turns out that the usual benign cases of independently, identically distributed (i.i.d.) observations or strongly convex losses are not sufficient. However, combining i.i.d. observations with the assumption that outliers are those observations that are in an extreme quantile of the distribution, does lead to sublinear robust regret, even though the expected number of outliers is linear.

التعلم الآلي

Interpreting Robust Optimization via Adversarial Influence Functions

220 - Zhun Deng , Cynthia Dwork , Jialiang Wang 2020

Robust optimization has been widely used in nowadays data science, especially in adversarial training. However, little research has been done to quantify how robust optimization changes the optimizers and the prediction losses comparing to standard t raining. In this paper, inspired by the influence function in robust statistics, we introduce the Adversarial Influence Function (AIF) as a tool to investigate the solution produced by robust optimization. The proposed AIF enjoys a closed-form and can be calculated efficiently. To illustrate the usage of AIF, we apply it to study model sensitivity -- a quantity defined to capture the change of prediction losses on the natural data after implementing robust optimization. We use AIF to analyze how model complexity and randomized smoothing affect the model sensitivity with respect to specific models. We further derive AIF for kernel regressions, with a particular application to neural tangent kernels, and experimentally demonstrate the effectiveness of the proposed AIF. Lastly, the theories of AIF will be extended to distributional robust optimization.

التعلم الآلي الذكاء الاصطناعي التعلم الالي

Policy Optimization as Online Learning with Mediator Feedback

125 - Alberto Maria Metelli , Matteo Papini , Pierluca DOro 2020

Policy Optimization (PO) is a widely used approach to address continuous control tasks. In this paper, we introduce the notion of mediator feedback that frames PO as an online learning problem over the policy space. The additional available informati on, compared to the standard bandit feedback, allows reusing samples generated by one policy to estimate the performance of other policies. Based on this observation, we propose an algorithm, RANDomized-exploration policy Optimization via Multiple Importance Sampling with Truncation (RANDOMIST), for regret minimization in PO, that employs a randomized exploration strategy, differently from the existing optimistic approaches. When the policy space is finite, we show that under certain circumstances, it is possible to achieve constant regret, while always enjoying logarithmic regret. We also derive problem-dependent regret lower bounds. Then, we extend RANDOMIST to compact policy spaces. Finally, we provide numerical simulations on finite and compact policy spaces, in comparison with PO and bandit baselines.

التعلم الآلي الذكاء الاصطناعي التعلم الالي