Visual Navigation Among Humans with Optimal Control as a Supervisor

322 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Varun Tolani

تاريخ النشر 2020

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Varun Tolani - Somil Bansal - Aleksandra Faust

علم الروبوتات الذكاء الاصطناعي الرؤية الحاسوبية وتمييز الأنماط

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Real world visual navigation requires robots to operate in unfamiliar, human-occupied dynamic environments. Navigation around humans is especially difficult because it requires anticipating their future motion, which can be quite challenging. We propose an approach that combines learning-based perception with model-based optimal control to navigate among humans based only on monocular, first-person RGB images. Our approach is enabled by our novel data-generation tool, HumANav that allows for photorealistic renderings of indoor environment scenes with humans in them, which are then used to train the perception module entirely in simulation. Through simulations and experiments on a mobile robot, we demonstrate that the learned navigation policies can anticipate and react to humans without explicitly predicting future human motion, generalize to previously unseen environments and human behaviors, and transfer directly from simulation to reality. Videos describing our approach and experiments, as well as a demo of HumANav are available on the project website.

قيم البحث

107 - Alessandro Antonucci , Paolo Bevilacqua , Stefano Leonardi 2021

One of the most important barriers toward a widespread use of mobile robots in unstructured and human populated work environments is the ability to plan a safe path. In this paper, we propose to delegate this activity to a human operator that walks i n front of the robot marking with her/his footsteps the path to be followed. The implementation of this approach requires a high degree of robustness in locating the specific person to be followed (the leader). We propose a three phase approach to fulfil this goal: 1. identification and tracking of the person in the image space, 2. sensor fusion between camera data and laser sensors, 3. point interpolation with continuous curvature curves. The approach is described in the paper and extensively validated with experimental results.

علم الروبوتات

From Seeing to Moving: A Survey on Learning for Visual Indoor Navigation (VIN)

204 - Xin Ye , Yezhou Yang 2020

Visual Indoor Navigation (VIN) task has drawn increasing attention from the data-driven machine learning communities especially with the recently reported success from learning-based methods. Due to the innate complexity of this task, researchers hav e tried approaching the problem from a variety of different angles, the full scope of which has not yet been captured within an overarching report. This survey first summarizes the representative work of learning-based approaches for the VIN task and then identifies and discusses lingering issues impeding the VIN performance, as well as motivates future research in these key areas worth exploring for the community.

علم الروبوتات الذكاء الاصطناعي الرؤية الحاسوبية وتمييز الأنماط

Learning One-Shot Imitation from Humans without Humans

149 - Alessandro Bonardi , Stephen James , Andrew J. Davison 2019

Humans can naturally learn to execute a new task by seeing it performed by other individuals once, and then reproduce it in a variety of configurations. Endowing robots with this ability of imitating humans from third person is a very immediate and n atural way of teaching new tasks. Only recently, through meta-learning, there have been successful attempts to one-shot imitation learning from humans; however, these approaches require a lot of human resources to collect the data in the real world to train the robot. But is there a way to remove the need for real world human demonstrations during training? We show that with Task-Embedded Control Networks, we can infer control polices by embedding human demonstrations that can condition a control policy and achieve one-shot imitation learning. Importantly, we do not use a real human arm to supply demonstrations during training, but instead leverage domain randomisation in an application that has not been seen before: sim-to-real transfer on humans. Upon evaluating our approach on pushing and placing tasks in both simulation and in the real world, we show that in comparison to a system that was trained on real-world data we are able to achieve similar results by utilising only simulation data.

علم الروبوتات الذكاء الاصطناعي الرؤية الحاسوبية وتمييز الأنماط

ViNG: Learning Open-World Navigation with Visual Goals

111 - Dhruv Shah , Benjamin Eysenbach , Gregory Kahn 2020

We propose a learning-based navigation system for reaching visually indicated goals and demonstrate this system on a real mobile robot platform. Learning provides an appealing alternative to conventional methods for robotic navigation: instead of rea soning about environments in terms of geometry and maps, learning can enable a robot to learn about navigational affordances, understand what types of obstacles are traversable (e.g., tall grass) or not (e.g., walls), and generalize over patterns in the environment. However, unlike conventional planning algorithms, it is harder to change the goal for a learned policy during deployment. We propose a method for learning to navigate towards a goal image of the desired destination. By combining a learned policy with a topological graph constructed out of previously observed data, our system can determine how to reach this visually indicated goal even in the presence of variable appearance and lighting. Three key insights, waypoint proposal, graph pruning and negative mining, enable our method to learn to navigate in real-world environments using only offline data, a setting where prior methods struggle. We instantiate our method on a real outdoor ground robot and show that our system, which we call ViNG, outperforms previously-proposed methods for goal-conditioned reinforcement learning, including other methods that incorporate reinforcement learning and search. We also study how sysName generalizes to unseen environments and evaluate its ability to adapt to such an environment with growing experience. Finally, we demonstrate ViNG on a number of real-world applications, such as last-mile delivery and warehouse inspection. We encourage the reader to visit the project website for videos of our experiments and demonstrations sites.google.com/view/ving-robot.

علم الروبوتات الذكاء الاصطناعي التعلم الآلي

Towards real-world navigation with deep differentiable planners

363 - Shu Ishida , Jo~ao F. Henriques 2021

We train embodied neural networks to plan and navigate unseen complex 3D environments, emphasising real-world deployment. Rather than requiring prior knowledge of the agent or environment, the planner learns to model the state transitions and rewards . To avoid the potentially hazardous trial-and-error of reinforcement learning, we focus on differentiable planners such as Value Iteration Networks (VIN), which are trained offline from safe expert demonstrations. Although they work well in small simulations, we address two major limitations that hinder their deployment. First, we observed that current differentiable planners struggle to plan long-term in environments with a high branching complexity. While they should ideally learn to assign low rewards to obstacles to avoid collisions, we posit that the constraints imposed on the network are not strong enough to guarantee the network to learn sufficiently large penalties for every possible collision. We thus impose a structural constraint on the value iteration, which explicitly learns to model any impossible actions. Secondly, we extend the model to work with a limited perspective camera under translation and rotation, which is crucial for real robot deployment. Many VIN-like planners assume a 360 degrees or overhead view without rotation. In contrast, our method uses a memory-efficient lattice map to aggregate CNN embeddings of partial observations, and models the rotational dynamics explicitly using a 3D state-space grid (translation and rotation). Our proposals significantly improve semantic navigation and exploration on several 2D and 3D environments, succeeding in settings that are otherwise challenging for this class of methods. As far as we know, we are the first to successfully perform differentiable planning on the difficult Active Vision Dataset, consisting of real images captured from a robot.

علم الروبوتات الذكاء الاصطناعي الرؤية الحاسوبية وتمييز الأنماط