ﻻ يوجد ملخص باللغة العربية
Robots performing manipulation tasks must operate under uncertainty about both their pose and the dynamics of the system. In order to remain robust to modeling error and shifts in payload dynamics, agents must simultaneously perform estimation and control tasks. However, the optimal estimation actions are often not the optimal actions for accomplishing the control tasks, and thus agents trade between exploration and exploitation. This work frames the problem as a Bayes-adaptive Markov decision process and solves it online using Monte Carlo tree search and an extended Kalman filter to handle Gaussian process noise and parameter uncertainty in a continuous space. MCTS selects control actions to reduce model uncertainty and reach the goal state nearly optimally. Certainty equivalent model predictive control is used as a benchmark to compare performance in simulations with varying process noise and parameter uncertainty.
Accurate identification of parameters of load models is essential in power system computations, including simulation, prediction, and stability and reliability analysis. Conventional point estimation based composite load modeling approaches suffer fr
We propose a Thompson sampling-based learning algorithm for the Linear Quadratic (LQ) control problem with unknown system parameters. The algorithm is called Thompson sampling with dynamic episodes (TSDE) where two stopping criteria determine the len
Measurement and estimation of parameters are essential for science and engineering, where one of the main quests is to find systematic schemes that can achieve high precision. While conventional schemes for quantum parameter estimation focus on the o
This paper proposes a reinforcement learning approach for traffic control with the adaptive horizon. To build the controller for the traffic network, a Q-learning-based strategy that controls the green light passing time at the network intersections
Bayesian analysis is a framework for parameter estimation that applies even in uncertainty regimes where the commonly used local (frequentist) analysis based on the Cramer-Rao bound is not well defined. In particular, it applies when no initial infor