ترغب بنشر مسار تعليمي؟ اضغط هنا

Evolving Inborn Knowledge For Fast Adaptation in Dynamic POMDP Problems

213   0   0.0 ( 0 )
 نشر من قبل Eseoghene Ben-Iwhiwhu
 تاريخ النشر 2020
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

Rapid online adaptation to changing tasks is an important problem in machine learning and, recently, a focus of meta-reinforcement learning. However, reinforcement learning (RL) algorithms struggle in POMDP environments because the state of the system, essential in a RL framework, is not always visible. Additionally, hand-designed meta-RL architectures may not include suitable computational structures for specific learning problems. The evolution of online learning mechanisms, on the contrary, has the ability to incorporate learning strategies into an agent that can (i) evolve memory when required and (ii) optimize adaptation speed to specific online learning problems. In this paper, we exploit the highly adaptive nature of neuromodulated neural networks to evolve a controller that uses the latent space of an autoencoder in a POMDP. The analysis of the evolved networks reveals the ability of the proposed algorithm to acquire inborn knowledge in a variety of aspects such as the detection of cues that reveal implicit rewards, and the ability to evolve location neurons that help with navigation. The integration of inborn knowledge and online plasticity enabled fast adaptation and better performance in comparison to some non-evolutionary meta-reinforcement learning algorithms. The algorithm proved also to succeed in the 3D gaming environment Malmo Minecraft.



قيم البحث

اقرأ أيضاً

This paper reviews the overview of the dynamic shortest path routing problem and the various neural networks to solve it. Different shortest path optimization problems can be solved by using various neural networks algorithms. The routing in packet s witched multi-hop networks can be described as a classical combinatorial optimization problem i.e. a shortest path routing problem in graphs. The survey shows that the neural networks are the best candidates for the optimization of dynamic shortest path routing problems due to their fastness in computation comparing to other softcomputing and metaheuristics algorithms
Unsupervised anomaly discovery in stream data is a research topic with many practical applications. However, in many cases, it is not easy to collect enough training data with labeled anomalies for supervised learning of an anomaly detector in order to deploy it later for identification of real anomalies in streaming data. It is thus important to design anomalies detectors that can correctly detect anomalies without access to labeled training data. Our idea is to adapt the Online evolving Spiking Neural Network (OeSNN) classifier to the anomaly detection task. As a result, we offer an Online evolving Spiking Neural Network for Unsupervised Anomaly Detection algorithm (OeSNN-UAD), which, unlike OeSNN, works in an unsupervised way and does not separate output neurons into disjoint decision classes. OeSNN-UAD uses our proposed new two-step anomaly detection method. Also, we derive new theoretical properties of neuronal model and input layer encoding of OeSNN, which enable more effective and efficient detection of anomalies in our OeSNN-UAD approach. The proposed OeSNN-UAD detector was experimentally compared with state-of-the-art unsupervised and semi-supervised detectors of anomalies in stream data from the Numenta Anomaly Benchmark and Yahoo Anomaly Datasets repositories. Our approach outperforms the other solutions provided in the literature in the case of data streams from the Numenta Anomaly Benchmark repository. Also, in the case of real data files of the Yahoo Anomaly Benchmark repository, OeSNN-UAD outperforms other selected algorithms, whereas in the case of Yahoo Anomaly Benchmark synthetic data files, it provides competitive results to the results recently reported in the literature.
189 - Samuel Schmidgall 2020
The adaptive learning capabilities seen in biological neural networks are largely a product of the self-modifying behavior emerging from online plastic changes in synaptic connectivity. Current methods in Reinforcement Learning (RL) only adjust to ne w interactions after reflection over a specified time interval, preventing the emergence of online adaptivity. Recent work addressing this by endowing artificial neural networks with neuromodulated plasticity have been shown to improve performance on simple RL tasks trained using backpropagation, but have yet to scale up to larger problems. Here we study the problem of meta-learning in a challenging quadruped domain, where each leg of the quadruped has a chance of becoming unusable, requiring the agent to adapt by continuing locomotion with the remaining limbs. Results demonstrate that agents evolved using self-modifying plastic networks are more capable of adapting to complex meta-learning learning tasks, even outperforming the same network updated using gradient-based algorithms while taking less time to train.
CNF-based SAT and MaxSAT solvers are central to logic synthesis and verification systems. The increasing popularity of these constraint problems in electronic design automation encourages studies on different SAT problems and their properties for fur ther computational efficiency. There has been both theoretical and practical success of modern Conflict-driven clause learning SAT solvers, which allows solving very large industrial instances in a relatively short amount of time. Recently, machine learning approaches provide a new dimension to solving this challenging problem. Neural symbolic models could serve as generic solvers that can be specialized for specific domains based on data without any changes to the structure of the model. In this work, we propose a one-shot model derived from the Transformer architecture to solve the MaxSAT problem, which is the optimization version of SAT where the goal is to satisfy the maximum number of clauses. Our model has a scale-free structure which could process varying size of instances. We use meta-path and self-attention mechanism to capture interactions among homogeneous nodes. We adopt cross-attention mechanisms on the bipartite graph to capture interactions among heterogeneous nodes. We further apply an iterative algorithm to our model to satisfy additional clauses, enabling a solution approaching that of an exact-SAT problem. The attention mechanisms leverage the parallelism for speedup. Our evaluation indicates improved speedup compared to heuristic approaches and improved completion rate compared to machine learning approaches.
In this paper we consider infinite horizon discounted dynamic programming problems with finite state and control spaces, partial state observations, and a multiagent structure. We discuss and compare algorithms that simultaneously or sequentially opt imize the agents controls by using multistep lookahead, truncated rollout with a known base policy, and a terminal cost function approximation. Our methods specifically address the computational challenges of partially observable multiagent problems. In particular: 1) We consider rollout algorithms that dramatically reduce required computation while preserving the key cost improvement property of the standard rollout method. The per-step computational requirements for our methods are on the order of $O(Cm)$ as compared with $O(C^m)$ for standard rollout, where $C$ is the maximum cardinality of the constraint set for the control component of each agent, and $m$ is the number of agents. 2) We show that our methods can be applied to challenging problems with a graph structure, including a class of robot repair problems whereby multiple robots collaboratively inspect and repair a system under partial information. 3) We provide a simulation study that compares our methods with existing methods, and demonstrate that our methods can handle larger and more complex partially observable multiagent problems (state space size $10^{37}$ and control space size $10^{7}$, respectively). Finally, we incorporate our multiagent rollout algorithms as building blocks in an approximate policy iteration scheme, where successive rollout policies are approximated by using neural network classifiers. While this scheme requires a strictly off-line implementation, it works well in our computational experiments and produces additional significant performance improvement over the single online rollout iteration method.

الأسئلة المقترحة

التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا