ﻻ يوجد ملخص باللغة العربية
Offline reinforcement learning (RL) enables learning policies using pre-collected datasets without environment interaction, which provides a promising direction to make RL useable in real-world systems. Although recent offline RL studies have achieved much progress, existing methods still face many practical challenges in real-world system control tasks, such as computational restriction during agent training and the requirement of extra control flexibility. Model-based planning framework provides an attractive solution for such tasks. However, most model-based planning algorithms are not designed for offline settings. Simply combining the ingredients of offline RL with existing methods either provides over-restrictive planning or leads to inferior performance. We propose a new light-weighted model-based offline planning framework, namely MOPP, which tackles the dilemma between the restrictions of offline learning and high-performance planning. MOPP encourages more aggressive trajectory rollout guided by the behavior policy learned from data, and prunes out problematic trajectories to avoid potential out-of-distribution samples. Experimental results show that MOPP provides competitive performance compared with existing model-based offline planning and RL approaches, and allows easy adaptation to varying objectives and extra constraints.
Offline learning is a key part of making reinforcement learning (RL) useable in real systems. Offline RL looks at scenarios where there is data from a systems operation, but no direct access to the system when learning a policy. Recent work on traini
We present a framework for bi-level trajectory optimization in which a systems dynamics are encoded as the solution to a constrained optimization problem and smooth gradients of this lower-level problem are passed to an upper-level trajectory optimiz
New autonomous driving technologies are emerging every day and some of them have been commercially applied in the real world. While benefiting from these technologies, autonomous trucks are facing new challenges in short-term maintenance planning, wh
We introduce a prioritized system-optimal algorithm for mandatory lane change (MLC) behavior of connected and automated vehicles (CAV) from a dedicated lane. Our approach applies a cooperative lane change that prioritizes the decisions of lane changi
In this paper, we consider a wireless uplink transmission scenario in which an unmanned aerial vehicle (UAV) serves as an aerial base station collecting data from ground users. To optimize the expected sum uplink transmit rate without any prior knowl