Provably-Efficient Double Q-Learning

93 0 0.0 ( 0 )

Download Cite

Added by Wentao Weng

Publication date 2020

fields Informatics Engineering Mathematical Statistics

and research's language is English

Authors Wentao Weng - Harsh Gupta - Niao He

Machine Learning Machine Learning

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

In this paper, we establish a theoretical comparison between the asymptotic mean-squared error of Double Q-learning and Q-learning. Our result builds upon an analysis for linear stochastic approximation based on Lyapunov equations and applies to both tabular setting and with linear function approximation, provided that the optimal policy is unique and the algorithms converge. We show that the asymptotic mean-squared error of Double Q-learning is exactly equal to that of Q-learning if Double Q-learning uses twice the learning rate of Q-learning and outputs the average of its two estimators. We also present some practical implications of this theoretical observation using simulations.

rate research

Provably More Efficient Q-Learning in the One-Sided-Feedback/Full-Feedback Settings

65 - Xiao-Yue Gong , David Simchi-Levi 2020

Motivated by the episodic version of the classical inventory control problem, we propose a new Q-learning-based algorithm, Elimination-Based Half-Q-Learning (HQL), that enjoys improved efficiency over existing algorithms for a wide variety of problems in the one-sided-feedback setting. We also provide a simpler variant of the algorithm, Full-Q-Learning (FQL), for the full-feedback setting. We establish that HQL incurs $ tilde{mathcal{O}}(H^3sqrt{ T})$ regret and FQL incurs $tilde{mathcal{O}}(H^2sqrt{ T})$ regret, where $H$ is the length of each episode and $T$ is the total length of the horizon. The regret bounds are not affected by the possibly huge state and action space. Our numerical experiments demonstrate the superior efficiency of HQL and FQL, and the potential to combine reinforcement learning with richer feedback models.

Machine Learning Machine Learning

Is Q-Learning Provably Efficient? An Extended Analysis

201 - Kushagra Rastogi , Jonathan Lee , Fabrice Harel-Canada 2020

This work extends the analysis of the theoretical results presented within the paper Is Q-Learning Provably Efficient? by Jin et al. We include a survey of related research to contextualize the need for strengthening the theoretical guarantees related to perhaps the most important threads of model-free reinforcement learning. We also expound upon the reasoning used in the proofs to highlight the critical steps leading to the main result showing that Q-learning with UCB exploration achieves a sample efficiency that matches the optimal regret that can be achieved by any model-based approach.

Machine Learning Artificial Intelligence Optimization and Control

Provably Robust Metric Learning

135 - Lu Wang , Xuanqing Liu , Jinfeng Yi 2020

Metric learning is an important family of algorithms for classification and similarity search, but the robustness of learned metrics against small adversarial perturbations is less studied. In this paper, we show that existing metric learning algorithms, which focus on boosting the clean accuracy, can result in metrics that are less robust than the Euclidean distance. To overcome this problem, we propose a novel metric learning algorithm to find a Mahalanobis distance that is robust against adversarial perturbations, and the robustness of the resulting model is certifiable. Experimental results show that the proposed metric learning algorithm improves both certified robust errors and empirical robust errors (errors under adversarial attacks). Furthermore, unlike neural network defenses which usually encounter a trade-off between clean and robust errors, our method does not sacrifice clean errors compared with previous metric learning methods. Our code is available at https://github.com/wangwllu/provably_robust_metric_learning.

Machine Learning Machine Learning

Provably Consistent Partial-Label Learning

76 - Lei Feng , Jiaqi Lv , Bo Han 2020

Partial-label learning (PLL) is a multi-class classification problem, where each training example is associated with a set of candidate labels. Even though many practical PLL methods have been proposed in the last two decades, there lacks a theoretical understanding of the consistency of those methods-none of the PLL methods hitherto possesses a generation process of candidate label sets, and then it is still unclear why such a method works on a specific dataset and when it may fail given a different dataset. In this paper, we propose the first generation model of candidate label sets, and develop two novel PLL methods that are guaranteed to be provably consistent, i.e., one is risk-consistent and the other is classifier-consistent. Our methods are advantageous, since they are compatible with any deep network or stochastic optimizer. Furthermore, thanks to the generation model, we would be able to answer the two questions above by testing if the generation model matches given candidate label sets. Experiments on benchmark and real-world datasets validate the effectiveness of the proposed generation model and two PLL methods.

Machine Learning Machine Learning

Koopman Spectrum Nonlinear Regulator and Provably Efficient Online Learning

80 - Motoya Ohnishi , Isao Ishikawa , Kendall Lowrey 2021

Most modern reinforcement learning algorithms optimize a cumulative single-step cost along a trajectory. The optimized motions are often unnatural, representing, for example, behaviors with sudden accelerations that waste energy and lack predictability. In this work, we present a novel paradigm of controlling nonlinear systems via the minimization of the Koopman spectrum cost: a cost over the Koopman operator of the controlled dynamics. This induces a broader class of dynamical behaviors that evolve over stable manifolds such as nonlinear oscillators, closed loops, and smooth movements. We demonstrate that some dynamics realizations that are not possible with a cumulative cost are feasible in this paradigm. Moreover, we present a provably efficient online learning algorithm for our problem that enjoys a sub-linear regret bound under some structural assumptions.

Machine Learning Robotics Systems and Control

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Provably-Efficient Double Q-Learning

Ask ChatGPT about the research

No Arabic abstract

Read More

suggested questions