ES-ENAS: Controller-Based Architecture Search for Evolutionary Reinforcement Learning

158 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Xingyou Song

تاريخ النشر 2021

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Xingyou Song - Krzysztof Choromanski - Jack Parker-Holder

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

We introduce ES-ENAS, a simple yet general evolutionary joint optimization procedure by combining continuous optimization via Evolutionary Strategies (ES) and combinatorial optimization via Efficient NAS (ENAS) in a highly scalable and intuitive way. Our main insight is noticing that ES is already a highly distributed algorithm involving hundreds of forward passes which can not only be used for training neural network weights, but also for jointly training a NAS controller, both in a blackbox fashion. By doing so, we also bridge the gap from NAS research in supervised learning settings to the reinforcement learning scenario through this relatively simple marriage between two different yet common lines of research. We demonstrate the utility and effectiveness of our method over a large search space by training highly combinatorial neural network architectures for RL problems in continuous control, via edge pruning and quantization. We also incorporate a wide variety of popular techniques from modern NAS literature including multiobjective optimization along with various controller methods, to showcase their promise in the RL field and discuss possible extensions.

قيم البحث

221 - Xin Chen , Yawen Duan , Zewei Chen 2020

Neural Architecture Search (NAS) achieved many breakthroughs in recent years. In spite of its remarkable progress, many algorithms are restricted to particular search spaces. They also lack efficient mechanisms to reuse knowledge when confronting mul tiple tasks. These challenges preclude their applicability, and motivate our proposal of CATCH, a novel Context-bAsed meTa reinforcement learning (RL) algorithm for transferrable arChitecture searcH. The combination of meta-learning and RL allows CATCH to efficiently adapt to new tasks while being agnostic to search spaces. CATCH utilizes a probabilistic encoder to encode task properties into latent context variables, which then guide CATCHs controller to quickly catch top-performing networks. The contexts also assist a network evaluator in filtering inferior candidates and speed up learning. Extensive experiments demonstrate CATCHs universality and search efficiency over many other widely-recognized algorithms. It is also capable of handling cross-domain architecture search as competitive networks on ImageNet, COCO, and Cityscapes are identified. This is the first work to our knowledge that proposes an efficient transferrable NAS solution while maintaining robustness across various settings.

التعلم الآلي الذكاء الاصطناعي

Scalable Reinforcement-Learning-Based Neural Architecture Search for Cancer Deep Learning Research

147 - Prasanna Balaprakash , Romain Egele , Misha Salim 2019

Cancer is a complex disease, the understanding and treatment of which are being aided through increases in the volume of collected data and in the scale of deployed computing power. Consequently, there is a growing need for the development of data-dr iven and, in particular, deep learning methods for various tasks such as cancer diagnosis, detection, prognosis, and prediction. Despite recent successes, however, designing high-performing deep learning models for nonimage and nontext cancer data is a time-consuming, trial-and-error, manual task that requires both cancer domain and deep learning expertise. To that end, we develop a reinforcement-learning-based neural architecture search to automate deep-learning-based predictive model development for a class of representative cancer data. We develop custom building blocks that allow domain experts to incorporate the cancer-data-specific characteristics. We show that our approach discovers deep neural network architectures that have significantly fewer trainable parameters, shorter training time, and accuracy similar to or higher than those of manually designed architectures. We study and demonstrate the scalability of our approach on up to 1,024 Intel Knights Landing nodes of the Theta supercomputer at the Argonne Leadership Computing Facility.

التعلم الآلي التعلم الالي

Transformer Based Reinforcement Learning For Games

78 - Uddeshya Upadhyay , Nikunj Shah , Sucheta Ravikanti 2019

Recent times have witnessed sharp improvements in reinforcement learning tasks using deep reinforcement learning techniques like Deep Q Networks, Policy Gradients, Actor Critic methods which are based on deep learning based models and back-propagatio n of gradients to train such models. An active area of research in reinforcement learning is about training agents to play complex video games, which so far has been something accomplished only by human intelligence. Some state of the art performances in video game playing using deep reinforcement learning are obtained by processing the sequence of frames from video games, passing them through a convolutional network to obtain features and then using recurrent neural networks to figure out the action leading to optimal rewards. The recurrent neural network will learn to extract the meaningful signal out of the sequence of such features. In this work, we propose a method utilizing a transformer network which have recently replaced RNNs in Natural Language Processing (NLP), and perform experiments to compare with existing methods.

التعلم الآلي الحوسبة العصبية والتطورية

RL-DARTS: Differentiable Architecture Search for Reinforcement Learning

336 - Yingjie Miao , Xingyou Song , Daiyi Peng 2021

We introduce RL-DARTS, one of the first applications of Differentiable Architecture Search (DARTS) in reinforcement learning (RL) to search for convolutional cells, applied to the Procgen benchmark. We outline the initial difficulties of applying neu ral architecture search techniques in RL, and demonstrate that by simply replacing the image encoder with a DARTS supernet, our search method is sample-efficient, requires minimal extra compute resources, and is also compatible with off-policy and on-policy RL algorithms, needing only minor changes in preexisting code. Surprisingly, we find that the supernet can be used as an actor for inference to generate replay data in standard RL training loops, and thus train end-to-end. Throughout this training process, we show that the supernet gradually learns better cells, leading to alternative architectures which can be highly competitive against manually designed policies, but also verify previous design choices for RL policies.

التعلم الآلي الذكاء الاصطناعي الرؤية الحاسوبية وتمييز الأنماط

UNAS: Differentiable Architecture Search Meets Reinforcement Learning

84 - Arash Vahdat , Arun Mallya , Ming-Yu Liu 2019

Neural architecture search (NAS) aims to discover network architectures with desired properties such as high accuracy or low latency. Recently, differentiable NAS (DNAS) has demonstrated promising results while maintaining a search cost orders of mag nitude lower than reinforcement learning (RL) based NAS. However, DNAS models can only optimize differentiable loss functions in search, and they require an accurate differentiable approximation of non-differentiable criteria. In this work, we present UNAS, a unified framework for NAS, that encapsulates recent DNAS and RL-based approaches under one framework. Our framework brings the best of both worlds, and it enables us to search for architectures with both differentiable and non-differentiable criteria in one unified framework while maintaining a low search cost. Further, we introduce a new objective function for search based on the generalization gap that prevents the selection of architectures prone to overfitting. We present extensive experiments on the CIFAR-10, CIFAR-100, and ImageNet datasets and we perform search in two fundamentally different search spaces. We show that UNAS obtains the state-of-the-art average accuracy on all three datasets when compared to the architectures searched in the DARTS space. Moreover, we show that UNAS can find an efficient and accurate architecture in the ProxylessNAS search space, that outperforms existing MobileNetV2 based architectures. The source code is available at https://github.com/NVlabs/unas .

التعلم الآلي الرؤية الحاسوبية وتمييز الأنماط التعلم الالي