ﻻ يوجد ملخص باللغة العربية
AlphaGos astonishing performance has ignited an explosive interest in developing deep reinforcement learning (DRL) for numerous real-world applications, such as intelligent robotics. However, the often prohibitive complexity of DRL stands at the odds with the required real-time control and constrained resources in many DRL applications, limiting the great potential of DRL powered intelligent devices. While substantial efforts have been devoted to compressing other deep learning models, existing works barely touch the surface of compressing DRL. In this work, we first identify that there exists an optimal model size of DRL that can maximize both the test scores and efficiency, motivating the need for task-specific DRL agents. We therefore propose an Auto-Agent-Distiller (A2D) framework, which to our best knowledge is the first neural architecture search (NAS) applied to DRL to automatically search for the optimal DRL agents for various tasks that optimize both the test scores and efficiency. Specifically, we demonstrate that vanilla NAS can easily fail in searching for the optimal agents, due to its resulting high variance in DRL training stability, and then develop a novel distillation mechanism to distill the knowledge from both the teacher agents actor and critic to stabilize the searching process and improve the searched agents optimality. Extensive experiments and ablation studies consistently validate our findings and the advantages and general applicability of our A2D, outperforming manually designed DRL in both the test scores and efficiency. All the codes will be released upon acceptance.
Driven by the explosive interest in applying deep reinforcement learning (DRL) agents to numerous real-time control and decision-making applications, there has been a growing demand to deploy DRL agents to empower daily-life intelligent devices, whil
Neural architecture search (NAS) has been proposed to automatically tune deep neural networks, but existing search algorithms, e.g., NASNet, PNAS, usually suffer from expensive computational cost. Network morphism, which keeps the functionality of a
Differentiable neural architecture search (DNAS) is known for its capacity in the automatic generation of superior neural networks. However, DNAS based methods suffer from memory usage explosion when the search space expands, which may prevent them f
Cancer is a complex disease, the understanding and treatment of which are being aided through increases in the volume of collected data and in the scale of deployed computing power. Consequently, there is a growing need for the development of data-dr
Recent advances in quantum computing have drawn considerable attention to building realistic application for and using quantum computers. However, designing a suitable quantum circuit architecture requires expert knowledge. For example, it is non-tri