No Arabic abstract
In this paper we study the convergence of online gradient descent algorithms in reproducing kernel Hilbert spaces (RKHSs) without regularization. We establish a sufficient condition and a necessary condition for the convergence of excess generalization errors in expectation. A sufficient condition for the almost sure convergence is also given. With high probability, we provide explicit convergence rates of the excess generalization errors for both averaged iterates and the last iterate, which in turn also imply convergence rates with probability one. To our best knowledge, this is the first high-probability convergence rate for the last iterate of online gradient descent algorithms without strong convexity. Without any boundedness assumptions on iterates, our results are derived by a novel use of two measures of the algorithms one-step progress, respectively by generalization errors and by distances in RKHSs, where the variances of the involved martingales are cancelled out by the descent property of the algorithm.
In this paper we consider online mirror descent (OMD) algorithms, a class of scalable online learning algorithms exploiting data geometric structures through mirror maps. Necessary and sufficient conditions are presented in terms of the step size sequence ${eta_t}_{t}$ for the convergence of an OMD algorithm with respect to the expected Bregman distance induced by the mirror map. The condition is $lim_{ttoinfty}eta_t=0, sum_{t=1}^{infty}eta_t=infty$ in the case of positive variances. It is reduced to $sum_{t=1}^{infty}eta_t=infty$ in the case of zero variances for which the linear convergence may be achieved by taking a constant step size sequence. A sufficient condition on the almost sure convergence is also given. We establish tight error bounds under mild conditions on the mirror map, the loss function, and the regularizer. Our results are achieved by some novel analysis on the one-step progress of the OMD algorithm using smoothness and strong convexity of the mirror map and the loss function.
This paper leverages machine-learned predictions to design competitive algorithms for online conversion problems with the goal of improving the competitive ratio when predictions are accurate (i.e., consistency), while also guaranteeing a worst-case competitive ratio regardless of the prediction quality (i.e., robustness). We unify the algorithmic design of both integral and fractional conversion problems, which are also known as the 1-max-search and one-way trading problems, into a class of online threshold-based algorithms (OTA). By incorporating predictions into design of OTA, we achieve the Pareto-optimal trade-off of consistency and robustness, i.e., no online algorithm can achieve a better consistency guarantee given for a robustness guarantee. We demonstrate the performance of OTA using numerical experiments on Bitcoin conversion.
Generative adversarial imitation learning (GAIL) is a popular inverse reinforcement learning approach for jointly optimizing policy and reward from expert trajectories. A primary question about GAIL is whether applying a certain policy gradient algorithm to GAIL attains a global minimizer (i.e., yields the expert policy), for which existing understanding is very limited. Such global convergence has been shown only for the linear (or linear-type) MDP and linear (or linearizable) reward. In this paper, we study GAIL under general MDP and for nonlinear reward function classes (as long as the objective function is strongly concave with respect to the reward parameter). We characterize the global convergence with a sublinear rate for a broad range of commonly used policy gradient algorithms, all of which are implemented in an alternating manner with stochastic gradient ascent for reward update, including projected policy gradient (PPG)-GAIL, Frank-Wolfe policy gradient (FWPG)-GAIL, trust region policy optimization (TRPO)-GAIL and natural policy gradient (NPG)-GAIL. This is the first systematic theoretical study of GAIL for global convergence.
Industrial control systems are critical to the operation of industrial facilities, especially for critical infrastructures, such as refineries, power grids, and transportation systems. Similar to other information systems, a significant threat to industrial control systems is the attack from cyberspace---the offensive maneuvers launched by anonymous in the digital world that target computer-based assets with the goal of compromising a systems functions or probing for information. Owing to the importance of industrial control systems, and the possibly devastating consequences of being attacked, significant endeavors have been attempted to secure industrial control systems from cyberattacks. Among them are intrusion detection systems that serve as the first line of defense by monitoring and reporting potentially malicious activities. Classical machine-learning-based intrusion detection methods usually generate prediction models by learning modest-sized training samples all at once. Such approach is not always applicable to industrial control systems, as industrial control systems must process continuous control commands with limited computational resources in a nonstop way. To satisfy such requirements, we propose using online learning to learn prediction models from the controlling data stream. We introduce several state-of-the-art online learning algorithms categorically, and illustrate their efficacies on two typically used testbeds---power system and gas pipeline. Further, we explore a new cost-sensitive online learning algorithm to solve the class-imbalance problem that is pervasive in industrial intrusion detection systems. Our experimental results indicate that the proposed algorithm can achieve an overall improvement in the detection rate of cyberattacks in industrial control systems.
Motivated by online decision-making in time-varying combinatorial environments, we study the problem of transforming offline algorithms to their online counterparts. We focus on offline combinatorial problems that are amenable to a constant factor approximation using a greedy algorithm that is robust to local errors. For such problems, we provide a general framework that efficiently transforms offline robust greedy algorithms to online ones using Blackwell approachability. We show that the resulting online algorithms have $O(sqrt{T})$ (approximate) regret under the full information setting. We further introduce a bandit extension of Blackwell approachability that we call Bandit Blackwell approachability. We leverage this notion to transform greedy robust offline algorithms into a $O(T^{2/3})$ (approximate) regret in the bandit setting. Demonstrating the flexibility of our framework, we apply our offline-to-online transformation to several problems at the intersection of revenue management, market design, and online optimization, including product ranking optimization in online platforms, reserve price optimization in auctions, and submodular maximization. We show that our transformation, when applied to these applications, leads to new regret bounds or improves the current known bounds.