No Arabic abstract
Robots performing manipulation tasks must operate under uncertainty about both their pose and the dynamics of the system. In order to remain robust to modeling error and shifts in payload dynamics, agents must simultaneously perform estimation and control tasks. However, the optimal estimation actions are often not the optimal actions for accomplishing the control tasks, and thus agents trade between exploration and exploitation. This work frames the problem as a Bayes-adaptive Markov decision process and solves it online using Monte Carlo tree search and an extended Kalman filter to handle Gaussian process noise and parameter uncertainty in a continuous space. MCTS selects control actions to reduce model uncertainty and reach the goal state nearly optimally. Certainty equivalent model predictive control is used as a benchmark to compare performance in simulations with varying process noise and parameter uncertainty.
Accurate identification of parameters of load models is essential in power system computations, including simulation, prediction, and stability and reliability analysis. Conventional point estimation based composite load modeling approaches suffer from disturbances and noises and provide limited information of the system dynamics. In this work, a statistic (Bayesian Estimation) based distribution estimation approach is proposed for both static (ZIP) and dynamic (Induction Motor) load modeling. When dealing with multiple parameters, Gibbs sampling method is employed. In each iteration, the proposal samples each parameter while keeps others fixed. The proposed method provides a distribution estimation of load models coefficients and is robust to measurement errors.
We propose a Thompson sampling-based learning algorithm for the Linear Quadratic (LQ) control problem with unknown system parameters. The algorithm is called Thompson sampling with dynamic episodes (TSDE) where two stopping criteria determine the lengths of the dynamic episodes in Thompson sampling. The first stopping criterion controls the growth rate of episode length. The second stopping criterion is triggered when the determinant of the sample covariance matrix is less than half of the previous value. We show under some conditions on the prior distribution that the expected (Bayesian) regret of TSDE accumulated up to time T is bounded by O(sqrt{T}). Here O(.) hides constants and logarithmic factors. This is the first O(sqrt{T} ) bound on expected regret of learning in LQ control. By introducing a reinitialization schedule, we also show that the algorithm is robust to time-varying drift in model parameters. Numerical simulations are provided to illustrate the performance of TSDE.
Measurement and estimation of parameters are essential for science and engineering, where one of the main quests is to find systematic schemes that can achieve high precision. While conventional schemes for quantum parameter estimation focus on the optimization of the probe states and measurements, it has been recently realized that control during the evolution can significantly improve the precision. The identification of optimal controls, however, is often computationally demanding, as typically the optimal controls depend on the value of the parameter which then needs to be re-calculated after the update of the estimation in each iteration. Here we show that reinforcement learning provides an efficient way to identify the controls that can be employed to improve the precision. We also demonstrate that reinforcement learning is highly generalizable, namely the neural network trained under one particular value of the parameter can work for different values within a broad range. These desired features make reinforcement learning an efficient alternative to conventional optimal quantum control methods.
This paper proposes a reinforcement learning approach for traffic control with the adaptive horizon. To build the controller for the traffic network, a Q-learning-based strategy that controls the green light passing time at the network intersections is applied. The controller includes two components: the regular Q-learning controller that controls the traffic light signal, and the adaptive controller that continuously optimizes the action space for the Q-learning algorithm in order to improve the efficiency of the Q-learning algorithm. The regular Q-learning controller uses the control cost function as a reward function to determine the action to choose. The adaptive controller examines the control cost and updates the action space of the controller by determining the subset of actions that are most likely to obtain optimal results and shrinking the action space to that subset. Uncertainties in traffic influx and turning rate are introduced to test the robustness of the controller under a stochastic environment. Compared with those with model predictive control (MPC), the results show that the proposed Q-learning-based controller outperforms the MPC method by reaching a stable solution in a shorter period and achieves lower control costs. The proposed Q-learning-based controller is also robust under 30% traffic demand uncertainty and 15% turning rate uncertainty.
Bayesian analysis is a framework for parameter estimation that applies even in uncertainty regimes where the commonly used local (frequentist) analysis based on the Cramer-Rao bound is not well defined. In particular, it applies when no initial information about the parameter value is available, e.g., when few measurements are performed. Here, we consider three paradigmatic estimation schemes in continuous-variable quantum metrology (estimation of displacements, phases, and squeezing strengths) and analyse them from the Bayesian perspective. For each of these scenarios, we investigate the precision achievable with single-mode Gaussian states under homodyne and heterodyne detection. This allows us to identify Bayesian estimation strategies that combine good performance with the potential for straightforward experimental realization in terms of Gaussian states and measurements. Our results provide practical solutions for reaching uncertainties where local estimation techniques apply, thus bridging the gap to regimes where asymptotically optimal strategies can be employed.