No Arabic abstract
Differential Dynamic Programming (DDP) is an indirect method for trajectory optimization. Its efficiency derives from the exploitation of temporal structure (inherent to optimal control problems) and explicit roll-out/integration of the system dynamics. However, it suffers from numerical instability and, when compared to direct methods, it has limited initialization options (allows initialization of controls, but not of states) and lacks proper handling of control constraints. These limitations are due to the fact that DDP is a single shooting algorithm. In this work, we tackle these issues with a direct-indirect hybridization approach that is primarily driven by the dynamic feasibility of the optimal control problem. Our feasibility search emulates the numerical resolution of a direct transcription problem with only dynamics constraints, namely a multiple shooting formulation. We show that our approach has better numerical convergence than BOX-DDP (a shooting method), and that its convergence rate and runtime performance are competitive with state-of-the-art direct transcription formulations solved using the interior point and active set algorithms available in KNITRO. We further show that our approach decreases the dynamic feasibility error monotonically -- as in state-of-the-art nonlinear programming algorithms. We demonstrate the benefits of our hybrid approach by generating complex and athletic motions for quadruped and humanoid robots.
The work presented here is a novel biological approach for the compliant control of a robotic arm in real time (RT). We integrate a spiking cerebellar network at the core of a feedback control loop performing torque-driven control. The spiking cerebellar controller provides torque commands allowing for accurate and coordinated arm movements. To compute these output motor commands, the spiking cerebellar controller receives the robots sensorial signals, the robots goal behavior, and an instructive signal. These input signals are translated into a set of evolving spiking patterns representing univocally a specific system state at every point of time. Spike-timing-dependent plasticity (STDP) is then supported, allowing for building adaptive control. The spiking cerebellar controller continuously adapts the torque commands provided to the robot from experience as STDP is deployed. Adaptive torque commands, in turn, help the spiking cerebellar controller to cope with built-in elastic elements within the robots actuators mimicking human muscles (inherently elastic). We propose a natural integration of a bio inspired control scheme, based on the cerebellum, with a compliant robot. We prove that our compliant approach outperforms the accuracy of the default factory-installed position control in a set of tasks used for addressing cerebellar motor behavior: controlling six degrees of freedom (DoF) in smooth movements, fast ballistic movements, and unstructured scenario compliant movements.
While multiple studies have proposed methods for the formation control of unmanned aerial vehicles (UAV), the trajectories generated are generally unsuitable for tracking targets where the optimum coverage of the target by the formation is required at all times. We propose a path planning approach called the Flux Guided (FG) method, which generates collision-free trajectories while maximising the coverage of one or more targets. We show that by reformulating an existing least-squares flux minimisation problem as a constrained optimisation problem, the paths obtained are $1.5 times$ shorter and track directly toward the target. Also, we demonstrate that the scale of the formation can be controlled during flight, and that this feature can be used to track multiple scattered targets. The method is highly scalable since the planning algorithm is only required for a sub-set of UAVs on the open boundary of the formations surface. Finally, through simulating a 3d dynamic particle system that tracks the desired trajectories using a PID controller, we show that the resulting trajectories after time-optimal parameterisation are suitable for robotic controls.
Network graphs have become a popular tool to represent complex systems composed of many interacting subunits; especially in neuroscience, network graphs are increasingly used to represent and analyze functional interactions between neural sources. Interactions are often reconstructed using pairwise bivariate analyses, overlooking their multivariate nature: it is neglected that investigating the effect of one source on a target necessitates to take all other sources as potential nuisance variables into account; also combinations of sources may act jointly on a given target. Bivariate analyses produce networks that may contain spurious interactions, which reduce the interpretability of the network and its graph metrics. A truly multivariate reconstruction, however, is computationally intractable due to combinatorial explosion in the number of potential interactions. Thus, we have to resort to approximative methods to handle the intractability of multivariate interaction reconstruction, and thereby enable the use of networks in neuroscience. Here, we suggest such an approximative approach in the form of an algorithm that extends fast bivariate interaction reconstruction by identifying potentially spurious interactions post-hoc: the algorithm flags potentially spurious edges, which may then be pruned from the network. This produces a statistically conservative network approximation that is guaranteed to contain non-spurious interactions only. We describe the algorithm and present a reference implementation to test its performance. We discuss the algorithm in relation to other approximative multivariate methods and highlight suitable application scenarios. Our approach is a tractable and data-efficient way of reconstructing approximative networks of multivariate interactions. It is preferable if available data are limited or if fully multivariate approaches are computationally infeasible.
Reinforcement learning (RL) algorithms have been successfully applied to a range of challenging sequential decision making and control tasks. In this paper, we classify RL into direct and indirect RL according to how they seek the optimal policy of the Markov decision process problem. The former solves the optimal policy by directly maximizing an objective function using gradient descent methods, in which the objective function is usually the expectation of accumulative future rewards. The latter indirectly finds the optimal policy by solving the Bellman equation, which is the sufficient and necessary condition from Bellmans principle of optimality. We study policy gradient forms of direct and indirect RL and show that both of them can derive the actor-critic architecture and can be unified into a policy gradient with the approximate value function and the stationary state distribution, revealing the equivalence of direct and indirect RL. We employ a Gridworld task to verify the influence of different forms of policy gradient, suggesting their differences and relationships experimentally. Finally, we classify current mainstream RL algorithms using the direct and indirect taxonomy, together with other ones including value-based and policy-based, model-based and model-free.
This paper presents a state and state-input constrained variant of the discrete-time iterative Linear Quadratic Regulator (iLQR) algorithm, with linear time-complexity in the number of time steps. The approach is based on a projection of the control input onto the nullspace of the linearized constraints. We derive a fully constraint-compliant feedforward-feedback control update rule, for which we can solve efficiently with Riccati-style difference equations. We assume that the relative degree of all constraints in the discrete-time system model is equal to one, which often holds for robotics problems employing rigid-body dynamic models. Simulation examples, including a 6 DoF robotic arm, are given to validate and illustrate the performance of the method.