ترغب بنشر مسار تعليمي؟ اضغط هنا

Clone Swarms: Learning to Predict and Control Multi-Robot Systems by Imitation

104   0   0.0 ( 0 )
 نشر من قبل Siyu Zhou
 تاريخ النشر 2019
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

In this paper, we propose SwarmNet -- a neural network architecture that can learn to predict and imitate the behavior of an observed swarm of agents in a centralized manner. Tested on artificially generated swarm motion data, the network achieves high levels of prediction accuracy and imitation authenticity. We compare our model to previous approaches for modelling interaction systems and show how modifying components of other models gradually approaches the performance of ours. Finally, we also discuss an extension of SwarmNet that can deal with nondeterministic, noisy, and uncertain environments, as often found in robotics applications.

قيم البحث

اقرأ أيضاً

In swarm robotics, any of the robots in a swarm may be affected by different faults, resulting in significant performance declines. To allow fault recovery from randomly injected faults to different robots in a swarm, a model-free approach may be pre ferable due to the accumulation of faults in models and the difficulty to predict the behaviour of neighbouring robots. One model-free approach to fault recovery involves two phases: during simulation, a quality-diversity algorithm evolves a behaviourally diverse archive of controllers; during the target application, a search for the best controller is initiated after fault injection. In quality-diversity algorithms, the choice of the behavioural descriptor is a key design choice that determines the quality of the evolved archives, and therefore the fault recovery performance. Although the environment is an important determinant of behaviour, the impact of environmental diversity is often ignored in the choice of a suitable behavioural descriptor. This study compares different behavioural descriptors, including two generic descriptors that work on a wide range of tasks, one hand-coded descriptor which fits the domain of interest, and one novel type of descriptor based on environmental diversity, which we call Quality-Environment-Diversity (QED). Results demonstrate that the above-mentioned model-free approach to fault recovery is feasible in the context of swarm robotics, reducing the fault impact by a factor 2-3. Further, the environmental diversity obtained with QED yields a unique behavioural diversity profile that allows it to recover from high-impact faults.
We present a novel solution to the problem of simulation-to-real transfer, which builds on recent advances in robot skill decomposition. Rather than focusing on minimizing the simulation-reality gap, we learn a set of diverse policies that are parame terized in a way that makes them easily reusable. This diversity and parameterization of low-level skills allows us to find a transferable policy that is able to use combinations and variations of different skills to solve more complex, high-level tasks. In particular, we first use simulation to jointly learn a policy for a set of low-level skills, and a skill embedding parameterization which can be used to compose them. Later, we learn high-level policies which actuate the low-level policies via this skill embedding parameterization. The high-level policies encode how and when to reuse the low-level skills together to achieve specific high-level tasks. Importantly, our method learns to control a real robot in joint-space to achieve these high-level tasks with little or no on-robot time, despite the fact that the low-level policies may not be perfectly transferable from simulation to real, and that the low-level skills were not trained on any examples of high-level tasks. We illustrate the principles of our method using informative simulation experiments. We then verify its usefulness for real robotics problems by learning, transferring, and composing free-space and contact motion skills on a Sawyer robot using only joint-space control. We experiment with several techniques for composing pre-learned skills, and find that our method allows us to use both learning-based approaches and efficient search-based planning to achieve high-level tasks using only pre-learned skills.
Decentralized drone swarms deployed today either rely on sharing of positions among agents or detecting swarm members with the help of visual markers. This work proposes an entirely visual approach to coordinate markerless drone swarms based on imita tion learning. Each agent is controlled by a small and efficient convolutional neural network that takes raw omnidirectional images as inputs and predicts 3D velocity commands that match those computed by a flocking algorithm. We start training in simulation and propose a simple yet effective unsupervised domain adaptation approach to transfer the learned controller to the real world. We further train the controller with data collected in our motion capture hall. We show that the convolutional neural network trained on the visual inputs of the drone can learn not only robust inter-agent collision avoidance but also cohesion of the swarm in a sample-efficient manner. The neural controller effectively learns to localize other agents in the visual input, which we show by visualizing the regions with the most influence on the motion of an agent. We remove the dependence on sharing positions among swarm members by taking only local visual information into account for control. Our work can therefore be seen as the first step towards a fully decentralized, vision-based swarm without the need for communication or visual markers.
Learned world models summarize an agents experience to facilitate learning complex behaviors. While learning world models from high-dimensional sensory inputs is becoming feasible through deep learning, there are many potential ways for deriving beha viors from them. We present Dreamer, a reinforcement learning agent that solves long-horizon tasks from images purely by latent imagination. We efficiently learn behaviors by propagating analytic gradients of learned state values back through trajectories imagined in the compact state space of a learned world model. On 20 challenging visual control tasks, Dreamer exceeds existing approaches in data-efficiency, computation time, and final performance.
This paper considers the distributed sampled-data control problem of a group of mobile robots connected via distance-induced proximity networks. A dwell time is assumed in order to avoid chattering in the neighbor relations that may be caused by abru pt changes of positions when updating information from neighbors. Distributed sampled-data control laws are designed based on nearest neighbour rules, which in conjunction with continuous-time dynamics results in hybrid closed-loop systems. For uniformly and independently initial states, a sufficient condition is provided to guarantee synchronization for the system without leaders. In order to steer all robots to move with the desired orientation and speed, we then introduce a number of leaders into the system, and quantitatively establish the proportion of leaders needed to track either constant or time-varying signals. All these conditions depend only on the neighborhood radius, the maximum initial moving speed and the dwell time, without assuming a prior properties of the neighbor graphs as are used in most of the existing literature.

الأسئلة المقترحة

التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا