An Empirical Evaluation of True Online TD({lambda})

453 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Harm van Seijen

تاريخ النشر 2015

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Harm van Seijen - A. Rupam Mahmood - Patrick M. Pilarski

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

The true online TD({lambda}) algorithm has recently been proposed (van Seijen and Sutton, 2014) as a universal replacement for the popular TD({lambda}) algorithm, in temporal-difference learning and reinforcement learning. True online TD({lambda}) has better theoretical properties than conventional TD({lambda}), and the expectation is that it also results in faster learning. In this paper, we put this hypothesis to the test. Specifically, we compare the performance of true online TD({lambda}) with that of TD({lambda}) on challenging examples, random Markov reward processes, and a real-world myoelectric prosthetic arm. We use linear function approximation with tabular, binary, and non-binary features. We assess the algorithms along three dimensions: computational cost, learning speed, and ease of use. Our results confirm the strength of true online TD({lambda}): 1) for sparse feature vectors, the computational overhead with respect to TD({lambda}) is minimal; for non-sparse features the computation time is at most twice that of TD({lambda}), 2) across all domains/representations the learning speed of true online TD({lambda}) is often better, but never worse than that of TD({lambda}), and 3) true online TD({lambda}) is easier to use, because it does not require choosing between trace types, and it is generally more stable with respect to the step-size. Overall, our results suggest that true online TD({lambda}) should be the first choice when looking for an efficient, general-purpose TD method.

قيم البحث

111 - Harm van Seijen , A. Rupam Mahmood , Patrick M. Pilarski andn Marlos C. Machado 2015

The temporal-difference methods TD($lambda$) and Sarsa($lambda$) form a core part of modern reinforcement learning. Their appeal comes from their good performance, low computational cost, and their simple interpretation, given by their forward view. Recently, n

الذكاء الاصطناعي التعلم الآلي

An Empirical Evaluation of Portfolios Approaches for solving CSPs

334 - Roberto Amadini , Maurizio Gabbrielli , Jacopo Mauro 2012

Recent research in areas such as SAT solving and Integer Linear Programming has shown that the performances of a single arbitrarily efficient solver can be significantly outperformed by a portfolio of possibly slower on-average solvers. We report an empirical evaluation and comparison of portfolio approaches applied to Constraint Satisfaction Problems (CSPs). We compared models developed on top of off-the-shelf machine learning algorithms with respect to approaches used in the SAT field and adapted for CSPs, considering different portfolio sizes and using as evaluation metrics the number of solved problems and the time taken to solve them. Results indicate that the best SAT approaches have top performances also in the CSP field and are slightly more competitive than simple models built on top of classification algorithms.

الذكاء الاصطناعي التعلم الآلي

An Empirical Comparison of Algorithms for Aggregating Expert Predictions

342 - Varsha Dani , Omid Madani , David M Pennock 2012

Predicting the outcomes of future events is a challenging problem for which a variety of solution methods have been explored and attempted. We present an empirical comparison of a variety of online and offline adaptive algorithms for aggregating expe rts predictions of the outcomes of five years of US National Football League games (1319 games) using expert probability elicitations obtained from an Internet contest called ProbabilitySports. We find that it is difficult to improve over simple averaging of the predictions in terms of prediction accuracy, but that there is room for improvement in quadratic loss. Somewhat surprisingly, a Bayesian estimation algorithm which estimates the variance of each experts prediction exhibits the most consistent superior performance over simple averaging among our collection of algorithms.

الذكاء الاصطناعي التعلم الآلي

An Empirical Evaluation of Cost-based Federated SPARQL Query Processing Engines

97 - Umair Qudus , Muhammad Saleem , Axel-Cyrille Ngonga Ngomo 2021

Finding a good query plan is key to the optimization of query runtime. This holds in particular for cost-based federation engines, which make use of cardinality estimations to achieve this goal. A number of studies compare SPARQL federation engines a cross different performance metrics, including query runtime, result set completeness and correctness, number of sources selected and number of requests sent. Albeit informative, these metrics are generic and unable to quantify and evaluate the accuracy of the cardinality estimators of cost-based federation engines. To thoroughly evaluate cost-based federation engines, the effect of estimated cardinality errors on the overall query runtime performance must be measured. In this paper, we address this challenge by presenting novel evaluation metrics targeted at a fine-grained benchmarking of cost-based federated SPARQL query engines. We evaluate five cost-based federated SPARQL query engines using existing as well as novel evaluation metrics by using LargeRDFBench queries. Our results provide a detailed analysis of the experimental outcomes that reveal novel insights, useful for the development of future cost-based federated SPARQL query processing engines.

قواعد البيانات التعلم الآلي الأداء

Multi-Dueling Bandits and Their Application to Online Ranker Evaluation

294 - Brian Brost , Yevgeny Seldin , Ingemar J. Cox 2016

New ranking algorithms are continually being developed and refined, necessitating the development of efficient methods for evaluating these rankers. Online ranker evaluation focuses on the challenge of efficiently determining, from implicit user feed back, which ranker out of a finite set of rankers is the best. Online ranker evaluation can be modeled by dueling ban- dits, a mathematical model for online learning under limited feedback from pairwise comparisons. Comparisons of pairs of rankers is performed by interleaving their result sets and examining which documents users click on. The dueling bandits model addresses the key issue of which pair of rankers to compare at each iteration, thereby providing a solution to the exploration-exploitation trade-off. Recently, methods for simultaneously comparing more than two rankers have been developed. However, the question of which rankers to compare at each iteration was left open. We address this question by proposing a generalization of the dueling bandits model that uses simultaneous comparisons of an unrestricted number of rankers. We evaluate our algorithm on synthetic data and several standard large-scale online ranker evaluation datasets. Our experimental results show that the algorithm yields orders of magnitude improvement in performance compared to stateof- the-art dueling bandit algorithms.

استرجاع المعلومات التعلم الآلي التعلم الالي