Shaping Individualized Impedance Landscapes for Gait Training via Reinforcement Learning

306 0 0.0 ( 0 )

Download Cite

Added by Damiano Zanotto

Publication date 2021

fields Informatics Engineering

and research's language is English

Authors Yufeng Zhang - Shuai Li - Karen J. Nolan

Robotics

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Assist-as-needed (AAN) control aims at promoting therapeutic outcomes in robot-assisted rehabilitation by encouraging patients active participation. Impedance control is used by most AAN controllers to create a compliant force field around a target motion to ensure tracking accuracy while allowing moderate kinematic errors. However, since the parameters governing the shape of the force field are often tuned manually or adapted online based on simplistic assumptions about subjects learning abilities, the effectiveness of conventional AAN controllers may be limited. In this work, we propose a novel adaptive AAN controller that is capable of autonomously reshaping the force field in a phase-dependent manner according to each individuals motor abilities and task requirements. The proposed controller consists of a modified Policy Improvement with Path Integral algorithm, a model-free, sampling-based reinforcement learning method that learns a subject-specific impedance landscape in real-time, and a hierarchical policy parameter evaluation structure that embeds the AAN paradigm by specifying performance-driven learning goals. The adaptability of the proposed control strategy to subjects motor responses and its ability to promote short-term motor adaptations are experimentally validated through treadmill training sessions with able-bodied subjects who learned altered gait patterns with the assistance of a powered ankle-foot orthosis.

rate research

Learning Variable Impedance Control via Inverse Reinforcement Learning for Force-Related Tasks

114 - Xiang Zhang , Liting Sun , Zhian Kuang 2021

Many manipulation tasks require robots to interact with unknown environments. In such applications, the ability to adapt the impedance according to different task phases and environment constraints is crucial for safety and performance. Although many approaches based on deep reinforcement learning (RL) and learning from demonstration (LfD) have been proposed to obtain variable impedance skills on contact-rich manipulation tasks, these skills are typically task-specific and could be sensitive to changes in task settings. This paper proposes an inverse reinforcement learning (IRL) based approach to recover both the variable impedance policy and reward function from expert demonstrations. We explore different action space of the reward functions to achieve a more general representation of expert variable impedance skills. Experiments on two variable impedance tasks (Peg-in-Hole and Cup-on-Plate) were conducted in both simulations and on a real FANUC LR Mate 200iD/7L industrial robot. The comparison results with behavior cloning and force-based IRL proved that the learned reward function in the gain action space has better transferability than in the force space. Experiment videos are available at https://msc.berkeley.edu/research/impedance-irl.html.

Robotics

ROIAL: Region of Interest Active Learning for Characterizing Exoskeleton Gait Preference Landscapes

216 - Kejun Li , Maegan Tucker , Erdem B{i}y{i}k 2020

Characterizing what types of exoskeleton gaits are comfortable for users, and understanding the science of walking more generally, require recovering a users utility landscape. Learning these landscapes is challenging, as walking trajectories are defined by numerous gait parameters, data collection from human trials is expensive, and user safety and comfort must be ensured. This work proposes the Region of Interest Active Learning (ROIAL) framework, which actively learns each users underlying utility function over a region of interest that ensures safety and comfort. ROIAL learns from ordinal and preference feedback, which are more reliable feedback mechanisms than absolute numerical scores. The algorithms performance is evaluated both in simulation and experimentally for three non-disabled subjects walking inside of a lower-body exoskeleton. ROIAL learns Bayesian posteriors that predict each exoskeleton users utility landscape across four exoskeleton gait parameters. The algorithm discovers both commonalities and discrepancies across users gait preferences and identifies the gait parameters that most influenced user feedback. These results demonstrate the feasibility of recovering gait utility landscapes from limited human trials.

Robotics Human-Computer Interaction Machine Learning

Shaping Rewards for Reinforcement Learning with Imperfect Demonstrations using Generative Models

287 - Yuchen Wu , Melissa Mozifian , Florian Shkurti 2020

The potential benefits of model-free reinforcement learning to real robotics systems are limited by its uninformed exploration that leads to slow convergence, lack of data-efficiency, and unnecessary interactions with the environment. To address these drawbacks we propose a method that combines reinforcement and imitation learning by shaping the reward function with a state-and-action-dependent potential that is trained from demonstration data, using a generative model. We show that this accelerates policy learning by specifying high-value areas of the state and action space that are worth exploring first. Unlike the majority of existing methods that assume optimal demonstrations and incorporate the demonstration data as hard constraints on policy optimization, we instead incorporate demonstration data as advice in the form of a reward shaping potential trained as a generative model of states and actions. In particular, we examine both normalizing flows and Generative Adversarial Networks to represent these potentials. We show that, unlike many existing approaches that incorporate demonstrations as hard constraints, our approach is unbiased even in the case of suboptimal and noisy demonstrations. We present an extensive range of simulations, as well as experiments on the Franka Emika 7DOF arm, to demonstrate the practicality of our method.

Robotics Machine Learning

Preference-Based Learning for Exoskeleton Gait Optimization

73 - Maegan Tucker , Ellen Novoseller , Claudia Kann 2019

This paper presents a personalized gait optimization framework for lower-body exoskeletons. Rather than optimizing numerical objectives such as the mechanical cost of transport, our approach directly learns from user preferences, e.g., for comfort. Building upon work in preference-based interactive learning, we present the CoSpar algorithm. CoSpar prompts the user to give pairwise preferences between trials and suggest improvements; as exoskeleton walking is a non-intuitive behavior, users can provide preferences more easily and reliably than numerical feedback. We show that CoSpar performs competitively in simulation and demonstrate a prototype implementation of CoSpar on a lower-body exoskeleton to optimize human walking trajectory features. In the experiments, CoSpar consistently found user-preferred parameters of the exoskeletons walking gait, which suggests that it is a promising starting point for adapting and personalizing exoskeletons (or other assistive devices) to individual users.

Robotics

Optimizing Gait Libraries via a Coverage Metric

130 - Brian Bittner , Shai Revzen 2021

Many robots move through the world by composing locomotion primitives like steps and turns. To do so well, robots need not have primitives that make intuitive sense to humans. This becomes of paramount importance when robots are damaged and no longer move as designed. Here we propose a goal function we call coverage, that represents the usefulness of a library of locomotion primitives in a manner agnostic to the particulars of the primitives themselves. We demonstrate the ability to optimize coverage on both simulated and physical robots, and show that coverage can be rapidly recovered after injury. This suggests that by optimizing for coverage, robots can sustain their ability to navigate through the world even in the face of significant mechanical failures. The benefits of this approach are enhanced by sample-efficient, data-driven approaches to system identification that can rapidly inform the optimization of primitives. We found that the number of degrees of freedom improved the rate of recovery of our simulated robots, a rare result in the fields of gait optimization and reinforcement learning. We showed that a robot with limbs made of tree branches (for which no CAD model or first principles model was available) is able to quickly find an effective high-coverage library of motion primitives. The optimized primitives are entirely non-obvious to a human observer, and thus are unlikely to be attainable through manual tuning.

Robotics