ترغب بنشر مسار تعليمي؟ اضغط هنا

Learning and Planning for Temporally Extended Tasks in Unknown Environments

253   0   0.0 ( 0 )
 نشر من قبل Christopher Bradley
 تاريخ النشر 2021
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

We propose a novel planning technique for satisfying tasks specified in temporal logic in partially revealed environments. We define high-level actions derived from the environment and the given task itself, and estimate how each action contributes to progress towards completing the task. As the map is revealed, we estimate the cost and probability of success of each action from images and an encoding of that action using a trained neural network. These estimates guide search for the minimum-expected-cost plan within our model. Our learned model is structured to generalize across environments and task specifications without requiring retraining. We demonstrate an improvement in total cost in both simulated and real-world experiments compared to a heuristic-driven baseline.

قيم البحث

اقرأ أيضاً

We present a new framework for motion planning that wraps around existing kinodynamic planners and guarantees recursive feasibility when operating in a priori unknown, static environments. Our approach makes strong guarantees about overall safety and collision avoidance by utilizing a robust controller derived from reachability analysis. We ensure that motion plans never exit the safe backward reachable set of the initial state, while safely exploring the space. This preserves the safety of the initial state, and guarantees that that we will eventually find the goal if it is possible to do so while exploring safely. We implement our framework in the Robot Operating System (ROS) software environment and demonstrate it in a real-time simulation.
Motion planners for mobile robots in unknown environments face the challenge of simultaneously maintaining both robustness against unmodeled uncertainties and persistent feasibility of the trajectory-finding problem. That is, while dealing with uncer tainties, a motion planner must update its trajectory, adapting to the newly revealed environment in real-time; failing to do so may involve unsafe circumstances. Many existing planning algorithms guarantee these by maintaining the clearance needed to perform an emergency brake, which is itself a robust and persistently feasible maneuver. However, such maneuvers are not applicable for systems in which braking is impossible or risky, such as fixed-wing aircraft. To that end, we propose a real-time robust planner that recursively guarantees persistent feasibility without any need of braking. The planner ensures robustness against bounded uncertainties and persistent feasibility by constructing a loop of sequentially composed funnels, starting from the receding horizon local trajectorys forward reachable set. We implement the proposed algorithm for a robotic car tracking a speed-fixed reference trajectory. The experiment results show that the proposed algorithm can be run at faster than 16 Hz, while successfully keeping the system away from entering any dead-end, to maintain safety and feasibility.
Autonomous navigation through unknown environments is a challenging task that entails real-time localization, perception, planning, and control. UAVs with this capability have begun to emerge in the literature with advances in lightweight sensing and computing. Although the planning methodologies vary from platform to platform, many algorithms adopt a hierarchical planning architecture where a slow, low-fidelity global planner guides a fast, high-fidelity local planner. However, in unknown environments, this approach can lead to erratic or unstable behavior due to the interaction between the global planner, whose solution is changing constantly, and the local planner; a consequence of not capturing higher-order dynamics in the global plan. This work proposes a planning framework in which multi-fidelity models are used to reduce the discrepancy between the local and global planner. Our approach uses high-, medium-, and low-fidelity models to compose a path that captures higher-order dynamics while remaining computationally tractable. In addition, we address the interaction between a fast planner and a slower mapper by considering the sensor data not yet fused into the map during the collision check. This novel mapping and planning framework for agile flights is validated in simulation and hardware experiments, showing replanning times of 5-40 ms in cluttered environments.
Predictive auxiliary tasks have been shown to improve performance in numerous reinforcement learning works, however, this effect is still not well understood. The primary purpose of the work presented here is to investigate the impact that an auxilia ry tasks prediction timescale has on the agents policy performance. We consider auxiliary tasks which learn to make on-policy predictions using temporal difference learning. We test the impact of prediction timescale using a specific form of auxiliary task in which the input image is used as the prediction target, which we refer to as temporal difference autoencoders (TD-AE). We empirically evaluate the effect of TD-AE on the A2C algorithm in the VizDoom environment using different prediction timescales. While we do not observe a clear relationship between the prediction timescale on performance, we make the following observations: 1) using auxiliary tasks allows us to reduce the trajectory length of the A2C algorithm, 2) in some cases temporally extended TD-AE performs better than a straight autoencoder, 3) performance with auxiliary tasks is sensitive to the weight placed on the auxiliary loss, 4) despite this sensitivity, auxiliary tasks improved performance without extensive hyper-parameter tuning. Our overall conclusions are that TD-AE increases the robustness of the A2C algorithm to the trajectory length and while promising, further study is required to fully understand the relationship between auxiliary task prediction timescale and the agents performance.
In order for an autonomous robot to efficiently explore an unknown environment, it must account for uncertainty in sensor measurements, hazard assessment, localization, and motion execution. Making decisions for maximal reward in a stochastic setting requires value learning and policy construction over a belief space, i.e., probability distribution over all possible robot-world states. However, belief space planning in a large spatial environment over long temporal horizons suffers from severe computational challenges. Moreover, constructed policies must safely adapt to unexpected changes in the belief at runtime. This work proposes a scalable value learning framework, PLGRIM (Probabilistic Local and Global Reasoning on Information roadMaps), that bridges the gap between (i) local, risk-aware resiliency and (ii) global, reward-seeking mission objectives. Leveraging hierarchical belief space planners with information-rich graph structures, PLGRIM addresses large-scale exploration problems while providing locally near-optimal coverage plans. We validate our proposed framework with high-fidelity dynamic simulations in diverse environments and on physical robots in Martian-analog lava tubes.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا