No Arabic abstract
Compliant robotics have seen successful applications in energy efficient locomotion and cyclic manipulation. However, exploitation of variable physical impedance for energy efficient sequential movements has not been extensively addressed. This work employs a hierarchical approach to encapsulate low-level optimal control for sub-movement generation into an outer loop of iterative policy improvement, thereby leveraging the benefits of both optimal control and reinforcement learning. The framework enables optimizing efficiency trade-off for minimal energy expenses in a model-free manner, by taking account of cost function weighting, variable impedance exploitation, and transition timing -- which are associated with the skill of compliance. The effectiveness of the proposed method is evaluated using two consecutive reaching tasks on a variable impedance actuator. The results demonstrate significant energy saving by improving the skill of compliance, with an electrical consumption reduction of about 30% measured in a physical robot experiment.
Energy efficiency is a crucial issue towards longterm deployment of compliant robots in the real world. In the context of variable impedance actuators (VIAs), one of the main focuses has been on improving energy efficiency through reduction of energy consumption. However, the harvesting of dissipated energy in such systems remains under-explored. This study proposes a novel variable damping module design enabling energy regeneration in VIAs by exploiting the regenerative braking effect of DC motors. The proposed damping module uses four switches to combine regenerative and dynamic braking, in a hybrid approach that enables energy regeneration without a reduction in the range of damping achievable. A physical implementation on a simple VIA mechanism is presented in which the regenerative properties of the proposed module are characterised and compared against theoretical predictions. To investigate the role of variable regenerative damping in terms of energy efficiency of longterm operation, experiments are reported in which the VIA equipped with the proposed damping module performs sequential reaching to a series of stochastic targets. The results indicate that the combination of variable stiffness and variable regenerative damping is preferable to achieve the optimal trade-off between task performance and energy efficiency. Use of the latter results in a 25% performance improvement on overall performance metrics (incorporating reaching accuracy, settling time, energy consumption and regeneration), over comparable schemes where either stiffness or damping are fixed.
Robots that physically interact with their surroundings, in order to accomplish some tasks or assist humans in their activities, require to exploit contact forces in a safe and proficient manner. Impedance control is considered as a prominent approach in robotics to avoid large impact forces while operating in unstructured environments. In such environments, the conditions under which the interaction occurs may significantly vary during the task execution. This demands robots to be endowed with on-line adaptation capabilities to cope with sudden and unexpected changes in the environment. In this context, variable impedance control arises as a powerful tool to modulate the robots behavior in response to variations in its surroundings. In this survey, we present the state-of-the-art of approaches devoted to variable impedance control from control and learning perspectives (separately and jointly). Moreover, we propose a new taxonomy for mechanical impedance based on variability, learning, and control. The objective of this survey is to put together the concepts and efforts that have been done so far in this field, and to describe advantages and disadvantages of each approach. The survey concludes with open issues in the field and an envisioned framework that may potentially solve them.
Robust multi-agent trajectory prediction is essential for the safe control of robots and vehicles that interact with humans. Many existing methods treat social and temporal information separately and therefore fall short of modelling the joint future trajectories of all agents in a socially consistent way. To address this, we propose a new class of Latent Variable Sequential Set Transformers which autoregressively model multi-agent trajectories. We refer to these architectures as AutoBots. AutoBots model the contents of sets (e.g. representing the properties of agents in a scene) over time and employ multi-head self-attention blocks over these sequences of sets to encode the sociotemporal relationships between the different actors of a scene. This produces either the trajectory of one ego-agent or a distribution over the future trajectories for all agents under consideration. Our approach works for general sequences of sets and we provide illustrative experiments modelling the sequential structure of the multiple strokes that make up symbols in the Omniglot data. For the single-agent prediction case, we validate our model on the NuScenes motion prediction task and achieve competitive results on the global leaderboard. In the multi-agent forecasting setting, we validate our model on TrajNet. We find that our method outperforms physical extrapolation and recurrent network baselines and generates scene-consistent trajectories.
Many manipulation tasks require robots to interact with unknown environments. In such applications, the ability to adapt the impedance according to different task phases and environment constraints is crucial for safety and performance. Although many approaches based on deep reinforcement learning (RL) and learning from demonstration (LfD) have been proposed to obtain variable impedance skills on contact-rich manipulation tasks, these skills are typically task-specific and could be sensitive to changes in task settings. This paper proposes an inverse reinforcement learning (IRL) based approach to recover both the variable impedance policy and reward function from expert demonstrations. We explore different action space of the reward functions to achieve a more general representation of expert variable impedance skills. Experiments on two variable impedance tasks (Peg-in-Hole and Cup-on-Plate) were conducted in both simulations and on a real FANUC LR Mate 200iD/7L industrial robot. The comparison results with behavior cloning and force-based IRL proved that the learned reward function in the gain action space has better transferability than in the force space. Experiment videos are available at https://msc.berkeley.edu/research/impedance-irl.html.
Efficient sampling from constraint manifolds, and thereby generating a diverse set of solutions for feasibility problems, is a fundamental challenge. We consider the case where a problem is factored, that is, the underlying nonlinear program is decomposed into differentiable equality and inequality constraints, each of which depends only on some variables. Such problems are at the core of efficient and robust sequential robot manipulation planning. Naive sequential conditional sampling of individual variables, as well as fully joint sampling of all variables at once (e.g., leveraging optimization methods), can be highly inefficient and non-robust. We propose a novel framework to learn how to break the overall problem into smaller sequential sampling problems. Specifically, we leverage Monte-Carlo Tree Search to learn assignment orders for the variable-subsets, in order to minimize the computation time to generate feasible full samples. This strategy allows us to efficiently compute a set of diverse valid robot configurations for mode-switches within sequential manipulation tasks, which are waypoints for subsequent trajectory optimization or sampling-based motion planning algorithms. We show that the learning method quickly converges to the best sampling strategy for a given problem, and outperforms user-defined orderings or fully joint optimization, while providing a higher sample diversity.