ترغب بنشر مسار تعليمي؟ اضغط هنا

Learning Whole-body Motor Skills for Humanoids

85   0   0.0 ( 0 )
 نشر من قبل Zhibin Li PhD
 تاريخ النشر 2020
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

This paper presents a hierarchical framework for Deep Reinforcement Learning that acquires motor skills for a variety of push recovery and balancing behaviors, i.e., ankle, hip, foot tilting, and stepping strategies. The policy is trained in a physics simulator with realistic setting of robot model and low-level impedance control that are easy to transfer the learned skills to real robots. The advantage over traditional methods is the integration of high-level planner and feedback control all in one single coherent policy network, which is generic for learning versatile balancing and recovery motions against unknown perturbations at arbitrary locations (e.g., legs, torso). Furthermore, the proposed framework allows the policy to be learned quickly by many state-of-the-art learning algorithms. By comparing our learned results to studies of preprogrammed, special-purpose controllers in the literature, self-learned skills are comparable in terms of disturbance rejection but with additional advantages of producing a wide range of adaptive, versatile and robust behaviors.



قيم البحث

اقرأ أيضاً

326 - D. Kim , S. Jorgensen , J. Lee 2019
Whole-body control (WBC) is a generic task-oriented control method for feedback control of loco-manipulation behaviors in humanoid robots. The combination of WBC and model-based walking controllers has been widely utilized in various humanoid robots. However, to date, the WBC method has not been employed for unsupported passive-ankle dynamic locomotion. As such, in this paper, we devise a new WBC, dubbed whole-body locomotion controller (WBLC), that can achieve experimental dynamic walking on unsupported passive-ankle biped robots. A key aspect of WBLC is the relaxation of contact constraints such that the control commands produce reduced jerk when switching foot contacts. To achieve robust dynamic locomotion, we conduct an in-depth analysis of uncertainty for our dynamic walking algorithm called time-to-velocity-reversal (TVR) planner. The uncertainty study is fundamental as it allows us to improve the control algorithms and mechanical structure of our robot to fulfill the tolerated uncertainty. In addition, we conduct extensive experimentation for: 1) unsupported dynamic balancing (i.e. in-place stepping) with a six degree-of-freedom (DoF) biped, Mercury; 2) unsupported directional walking with Mercury; 3) walking over an irregular and slippery terrain with Mercury; and 4) in-place walking with our newly designed ten-DoF viscoelastic liquid-cooled biped, DRACO. Overall, the main contributions of this work are on: a) achieving various modalities of unsupported dynamic locomotion of passive-ankle bipeds using a WBLC controller and a TVR planner, b) conducting an uncertainty analysis to improve the mechanical structure and the controllers of Mercury, and c) devising a whole-body control strategy that reduces movement jerk during walking.
While the recent advances in deep reinforcement learning have achieved impressive results in learning motor skills, many of the trained policies are only capable within a limited set of initial states. We propose a technique to break down a complex r obotic task to simpler subtasks and train them sequentially such that the robot can expand its existing skill set gradually. Our key idea is to build a tree of local control policies represented by neural networks, which we refer as Relay Neural Networks. Starting from the root policy that attempts to achieve the task from a small set of initial states, each subsequent policy expands the set of successful initial states by driving the new states to existing good states. Our algorithm utilizes the value function of the policy to determine whether a state is good under each policy. We take advantage of many existing policy search algorithms that learn the value function simultaneously with the policy, such as those that use actor-critic representations or those that use the advantage function to reduce variance. We demonstrate that the relay networks can solve complex continuous control problems for underactuated dynamic systems.
Learning from demonstration (LfD) is commonly considered to be a natural and intuitive way to allow novice users to teach motor skills to robots. However, it is important to acknowledge that the effectiveness of LfD is heavily dependent on the qualit y of teaching, something that may not be assured with novices. It remains an open question as to the most effective way of guiding demonstrators to produce informative demonstrations beyond ad hoc advice for specific teaching tasks. To this end, this paper investigates the use of machine teaching to derive an index for determining the quality of demonstrations and evaluates its use in guiding and training novices to become better teachers. Experiments with a simple learner robot suggest that guidance and training of teachers through the proposed approach can lead to up to 66.5% decrease in error in the learnt skill.
Personal robots assisting humans must perform complex manipulation tasks that are typically difficult to specify in traditional motion planning pipelines, where multiple objectives must be met and the high-level context be taken into consideration. L earning from demonstration (LfD) provides a promising way to learn these kind of complex manipulation skills even from non-technical users. However, it is challenging for existing LfD methods to efficiently learn skills that can generalize to task specifications that are not covered by demonstrations. In this paper, we introduce a state transition model (STM) that generates joint-space trajectories by imitating motions from expert behavior. Given a few demonstrations, we show in real robot experiments that the learned STM can quickly generalize to unseen tasks and synthesize motions having longer time horizons than the expert trajectories. Compared to conventional motion planners, our approach enables the robot to accomplish complex behaviors from high-level instructions without laborious hand-engineering of planning objectives, while being able to adapt to changing goals during the skill execution. In conjunction with a trajectory optimizer, our STM can construct a high-quality skeleton of a trajectory that can be further improved in smoothness and precision. In combination with a learned inverse dynamics model, we additionally present results where the STM is used as a high-level planner. A video of our experiments is available at https://youtu.be/85DX9Ojq-90
Enabling robots to quickly learn manipulation skills is an important, yet challenging problem. Such manipulation skills should be flexible, e.g., be able adapt to the current workspace configuration. Furthermore, to accomplish complex manipulation ta sks, robots should be able to sequence several skills and adapt them to changing situations. In this work, we propose a rapid robot skill-sequencing algorithm, where the skills are encoded by object-centric hidden semi-Markov models. The learned skill models can encode multimodal (temporal and spatial) trajectory distributions. This approach significantly reduces manual modeling efforts, while ensuring a high degree of flexibility and re-usability of learned skills. Given a task goal and a set of generic skills, our framework computes smooth transitions between skill instances. To compute the corresponding optimal end-effector trajectory in task space we rely on Riemannian optimal controller. We demonstrate this approach on a 7 DoF robot arm for industrial assembly tasks.

الأسئلة المقترحة

التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا