A Self-adaptive SAC-PID Control Approach based on Reinforcement Learning for Mobile Robots

82 0 0.0 ( 0 )

Download Cite

Added by Yuehai Fan

Publication date 2021

fields Informatics Engineering

and research's language is English

Authors Xinyi Yu - Yuehai Fan - Siyu Xu

Robotics

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Proportional-integral-derivative (PID) control is the most widely used in industrial control, robot control and other fields. However, traditional PID control is not competent when the system cannot be accurately modeled and the operating environment is variable in real time. To tackle these problems, we propose a self-adaptive model-free SAC-PID control approach based on reinforcement learning for automatic control of mobile robots. A new hierarchical structure is developed, which includes the upper controller based on soft actor-critic (SAC), one of the most competitive continuous control algorithms, and the lower controller based on incremental PID controller. Soft actor-critic receives the dynamic information of the mobile robot as input, and simultaneously outputs the optimal parameters of incremental PID controllers to compensate for the error between the path and the mobile robot in real time. In addition, the combination of 24-neighborhood method and polynomial fitting is developed to improve the adaptability of SAC-PID control method to complex environments. The effectiveness of the SAC-PID control method is verified with several different difficulty paths both on Gazebo and real mecanum mobile robot. Futhermore, compared with fuzzy PID control, the SAC-PID method has merits of strong robustness, generalization and real-time performance.

rate research

Distributed Learning of Decentralized Control Policies for Articulated Mobile Robots

205 - Guillaume Sartoretti , William Paivine , Yunfei Shi 2019

State-of-the-art distributed algorithms for reinforcement learning rely on multiple independent agents, which simultaneously learn in parallel environments while asynchronously updating a common, shared policy. Moreover, decentralized control architectures (e.g., CPGs) can coordinate spatially distributed portions of an articulated robot to achieve system-level objectives. In this work, we investigate the relationship between distributed learning and decentralized control by learning decentralized control policies for the locomotion of articulated robots in challenging environments. To this end, we present an approach that leverages the structure of the asynchronous advantage actor-critic (A3C) algorithm to provide a natural means of learning decentralized control policies on a single articulated robot. Our primary contribution shows individual agents in the A3C algorithm can be defined by independently controlled portions of the robots body, thus enabling distributed learning on a single robot for efficient hardware implementation. We present results of closed-loop locomotion in unstructured terrains on a snake and a hexapod robot, using decentralized controllers learned offline and online respectively. Preprint of the paper submitted to the IEEE Transactions in Robotics (T-RO) journal in October 2018, and accepted for publication as a regular paper in May 2019.

Robotics

Reinforcement Learning for Robust Parameterized Locomotion Control of Bipedal Robots

118 - Zhongyu Li , Xuxin Cheng , Xue Bin Peng 2021

Developing robust walking controllers for bipedal robots is a challenging endeavor. Traditional model-based locomotion controllers require simplifying assumptions and careful modelling; any small errors can result in unstable control. To address these challenges for bipedal locomotion, we present a model-free reinforcement learning framework for training robust locomotion policies in simulation, which can then be transferred to a real bipedal Cassie robot. To facilitate sim-to-real transfer, domain randomization is used to encourage the policies to learn behaviors that are robust across variations in system dynamics. The learned policies enable Cassie to perform a set of diverse and dynamic behaviors, while also being more robust than traditional controllers and prior learning-based methods that use residual control. We demonstrate this on versatile walking behaviors such as tracking a target walking velocity, walking height, and turning yaw.

Robotics Artificial Intelligence Machine Learning

Learning-based Intelligent Attack against Mobile Robots with Obstacle-avoidance

215 - Yushan Li , Jianping He , Cailian Chen 2019

The security issue of mobile robots have attracted considerable attention in recent years. Most existing works focus on detection and countermeasures for some classic attacks from cyberspace. Nevertheless, those work are generally based on some prior assumptions for the attacker (e.g., the system dynamics is known, or internal access is compromised). A few work are delicated to physical attacks, however, there still lacks certain intelligence and advanced control design. In this paper, we propose a physical-based and intelligent attack framework against the obstacle-avoidance of mobile robots. The novelty of our work lies in the following: i) Without any prior information of the system dynamics, the attacker can learn the detection area and goal position of a mobile robot by trial and observation, and the obstacle-avoidance mechanism is learned by support vector regression (SVR) method; ii) Considering different attack requirements, different attack strategies are proposed to implement the attack efficiently; iii) The framework is suitable for holonomic and non-holonomic mobile robots, and the algorithm performance analysis about time complexity and optimality is provided. Furthermore, the condition is obtained to guarantee the success of the attack. Simulations illustrate the effectiveness of the proposed framework.

Robotics

Attention-based Active Visual Search for Mobile Robots

116 - Amir Rasouli , Pablo Lanillos , Gordon Cheng 2018

We present an active visual search model for finding objects in unknown environments. The proposed algorithm guides the robot towards the sought object using the relevant stimuli provided by the visual sensors. Existing search strategies are either purely reactive or use simplified sensor models that do not exploit all the visual information available. In this paper, we propose a new model that actively extracts visual information via visual attention techniques and, in conjunction with a non-myopic decision-making algorithm, leads the robot to search more relevant areas of the environment. The attention module couples both top-down and bottom-up attention models enabling the robot to search regions with higher importance first. The proposed algorithm is evaluated on a mobile robot platform in a 3D simulated environment. The results indicate that the use of visual attention significantly improves search, but the degree of improvement depends on the nature of the task and the complexity of the environment. In our experiments, we found that performance enhancements of up to 42% in structured and 38% in highly unstructured cluttered environments can be achieved using visual attention mechanisms.

Robotics Computer Vision and Pattern Recognition

A Passive Navigation Planning Algorithm for Collision-free Control of Mobile Robots

83 - Carlo Tiseo , Vladimir Ivan , Wolfgang Merkt 2020

Path planning and collision avoidance are challenging in complex and highly variable environments due to the limited horizon of events. In literature, there are multiple model- and learning-based approaches that require significant computational resources to be effectively deployed and they may have limited generality. We propose a planning algorithm based on a globally stable passive controller that can plan smooth trajectories using limited computational resources in challenging environmental conditions. The architecture combines the recently proposed fractal impedance controller with elastic bands and regions of finite time invariance. As the method is based on an impedance controller, it can also be used directly as a force/torque controller. We validated our method in simulation to analyse the ability of interactive navigation in challenging concave domains via the issuing of via-points, and its robustness to low bandwidth feedback. A swarm simulation using 11 agents validated the scalability of the proposed method. We have performed hardware experiments on a holonomic wheeled platform validating smoothness and robustness of interaction with dynamic agents (i.e., humans and robots). The computational complexity of the proposed local planner enables deployment with low-power micro-controllers lowering the energy consumption compared to other methods that rely upon numeric optimisation.

Robotics