No Arabic abstract
We consider online convex optimization when a number k of data points are outliers that may be corrupted. We model this by introducing the notion of robust regret, which measures the regret only on rounds that are not outliers. The aim for the learner is to achieve small robust regret, without knowing where the outliers are. If the outliers are chosen adversarially, we show that a simple filtering strategy on extreme gradients incurs O(k) additive overhead compared to the usual regret bounds, and that this is unimprovable, which means that k needs to be sublinear in the number of rounds. We further ask which additional assumptions would allow for a linear number of outliers. It turns out that the usual benign cases of independently, identically distributed (i.i.d.) observations or strongly convex losses are not sufficient. However, combining i.i.d. observations with the assumption that outliers are those observations that are in an extreme quantile of the distribution, does lead to sublinear robust regret, even though the expected number of outliers is linear.
We consider the decision-making framework of online convex optimization with a very large number of experts. This setting is ubiquitous in contextual and reinforcement learning problems, where the size of the policy class renders enumeration and search within the policy class infeasible. Instead, we consider generalizing the methodology of online boosting. We define a weak learning algorithm as a mechanism that guarantees multiplicatively approximate regret against a base class of experts. In this access model, we give an efficient boosting algorithm that guarantees near-optimal regret against the convex hull of the base class. We consider both full and partial (a.k.a. bandit) information feedback models. We also give an analogous efficient boosting algorithm for the i.i.d. statistical setting. Our results simultaneously generalize online boosting and gradient boosting guarantees to contextual learning model, online convex optimization and bandit linear optimization settings.
We introduce a new online convex optimization algorithm that adaptively chooses its regularization function based on the loss functions observed so far. This is in contrast to previous algorithms that use a fixed regularization function such as L2-squared, and modify it only via a single time-dependent parameter. Our algorithms regret bounds are worst-case optimal, and for certain realistic classes of loss functions they are much better than existing bounds. These bounds are problem-dependent, which means they can exploit the structure of the actual problem instance. Critically, however, our algorithm does not need to know this structure in advance. Rather, we prove competitive guarantees that show the algorithm provides a bound within a constant factor of the best possible bound (of a certain functional form) in hindsight.
We consider the problem of strongly-convex online optimization in presence of adversarial delays; in a T-iteration online game, the feedback of the players query at time t is arbitrarily delayed by an adversary for d_t rounds and delivered before the game ends, at iteration t+d_t-1. Specifically for algo{online-gradient-descent} algorithm we show it has a simple regret bound of Oh{sum_{t=1}^T log (1+ frac{d_t}{t})}. This gives a clear and simple bound without resorting any distributional and limiting assumptions on the delays. We further show how this result encompasses and generalizes several of the existing known results in the literature. Specifically it matches the celebrated logarithmic regret Oh{log T} when there are no delays (i.e. d_t = 1) and regret bound of Oh{tau log T} for constant delays d_t = tau.
Some of the most compelling applications of online convex optimization, including online prediction and classification, are unconstrained: the natural feasible set is R^n. Existing algorithms fail to achieve sub-linear regret in this setting unless constraints on the comparator point x^* are known in advance. We present algorithms that, without such prior knowledge, offer near-optimal regret bounds with respect to any choice of x^*. In particular, regret with respect to x^* = 0 is constant. We then prove lower bounds showing that our guarantees are near-optimal in this setting.
We consider online convex optimization (OCO) over a heterogeneous network with communication delay, where multiple workers together with a master execute a sequence of decisions to minimize the accumulation of time-varying global costs. The local data may not be independent or identically distributed, and the global cost functions may not be locally separable. Due to communication delay, neither the master nor the workers have in-time information about the current global cost function. We propose a new algorithm, termed Hierarchical OCO (HiOCO), which takes full advantage of the network heterogeneity in information timeliness and computation capacity to enable multi-step gradient descent at both the workers and the master. We analyze the impacts of the unique hierarchical architecture, multi-slot delay, and gradient estimation error to derive upper bounds on the dynamic regret of HiOCO, which measures the gap of costs between HiOCO and an offline globally optimal performance benchmark.