Research papers, master and doctoral theses about reinforcement

Neuro-Symbolic Reinforcement Learning with First-Order Logic

245 - Association for Computation Linguistics 2021 مقالة

Deep reinforcement learning (RL) methods often require many trials before convergence, and no direct interpretability of trained policies is provided. In order to achieve fast convergence and interpretability for the policy in RL, we propose a novel RL method for text-based games with a recent neuro-symbolic framework called Logical Neural Network, which can learn symbolic and interpretable rules in their differentiable network. The method is first to extract first-order logical facts from text observation and external word meaning network (ConceptNet), then train a policy in the network with directly interpretable logical operators. Our experimental results show RL training with the proposed method converges significantly faster than other state-of-the-art neuro-symbolic methods in a TextWorld benchmark.

عملاء التسوق عبر الإنترنت neuro-symbolic reinforcement learning تعلم التعزيز العصبي الرمزي صناعة حمض الفوسفور

A Collaborative Multi-agent Reinforcement Learning Framework for Dialog Action Decomposition

155 - Association for Computation Linguistics 2021 مقالة

Most reinforcement learning methods for dialog policy learning train a centralized agent that selects a predefined joint action concatenating domain name, intent type, and slot name. The centralized dialog agent suffers from a great many user-agent i nteraction requirements due to the large action space. Besides, designing the concatenated actions is laborious to engineers and maybe struggled with edge cases. To solve these problems, we model the dialog policy learning problem with a novel multi-agent framework, in which each part of the action is led by a different agent. The framework reduces labor costs for action templates and decreases the size of the action space for each agent. Furthermore, we relieve the non-stationary problem caused by the changing dynamics of the environment as evolving of agents' policies by introducing a joint optimization process that makes agents can exchange their policy information. Concurrently, an independent experience replay buffer mechanism is integrated to reduce the dependence between gradients of samples to improve training efficiency. The effectiveness of the proposed framework is demonstrated in a multi-domain environment with both user simulator evaluation and human evaluation.

collaborative multi-agent reinforcement dialog action decomposition multi-agent reinforcement learning التعاون التعاوني متعدد الوكيل التعزيز حوار إجراء التحلل التعلم التعزيز متعدد الوكيل صناعة حمض الفوسفور المزيد..

Generalization in Text-based Games via Hierarchical Reinforcement Learning

244 - Association for Computation Linguistics 2021 مقالة

Deep reinforcement learning provides a promising approach for text-based games in studying natural language communication between humans and artificial agents. However, the generalization still remains a big challenge as the agents depend critically on the complexity and variety of training tasks. In this paper, we address this problem by introducing a hierarchical framework built upon the knowledge graph-based RL agent. In the high level, a meta-policy is executed to decompose the whole game into a set of subtasks specified by textual goals, and select one of them based on the KG. Then a sub-policy in the low level is executed to conduct goal-conditioned reinforcement learning. We carry out experiments on games with various difficulty levels and show that the proposed method enjoys favorable generalizability.

hierarchical reinforcement learning deep reinforcement learning التعزيز التسلسل الهرمي التعلم التعزيز العميق التعلم صناعة حمض الفوسفور

Gradient Imitation Reinforcement Learning for Low Resource Relation Extraction

179 - Association for Computation Linguistics 2021 مقالة

Low-resource Relation Extraction (LRE) aims to extract relation facts from limited labeled corpora when human annotation is scarce. Existing works either utilize self-training scheme to generate pseudo labels that will cause the gradual drift problem , or leverage meta-learning scheme which does not solicit feedback explicitly. To alleviate selection bias due to the lack of feedback loops in existing LRE learning paradigms, we developed a Gradient Imitation Reinforcement Learning method to encourage pseudo label data to imitate the gradient descent direction on labeled data and bootstrap its optimization capability through trial and error. We also propose a framework called GradLRE, which handles two major scenarios in low-resource relation extraction. Besides the scenario where unlabeled data is sufficient, GradLRE handles the situation where no unlabeled data is available, by exploiting a contextualized augmentation method to generate data. Experimental results on two public datasets demonstrate the effectiveness of GradLRE on low resource relation extraction when comparing with baselines.

imitation reinforcement learning gradient imitation reinforcement resource relation extraction التعزيز التقليد التعلم التعزيز التدريجي استخراج علاقة الموارد صناعة حمض الفوسفور المزيد..

Offline Reinforcement Learning from Human Feedback in Real-World Sequence-to-Sequence Tasks

261 - Association for Computation Linguistics 2021 مقالة

Large volumes of interaction logs can be collected from NLP systems that are deployed in the real world. How can this wealth of information be leveraged? Using such interaction logs in an offline reinforcement learning (RL) setting is a promising app roach. However, due to the nature of NLP tasks and the constraints of production systems, a series of challenges arise. We present a concise overview of these challenges and discuss possible solutions.

human feedback feedback in real-world offline reinforcement learning ردود الفعل الإنسانية ردود الفعل في العالم الحقيقي التعزيز التعزيز غير متصل صناعة حمض الفوسفور المزيد..

RewardsOfSum: Exploring Reinforcement Learning Rewards for Summarisation

207 - Association for Computation Linguistics 2021 مقالة

To date, most abstractive summarisation models have relied on variants of the negative log-likelihood (NLL) as their training objective. In some cases, reinforcement learning has been added to train the models with an objective that is closer to thei r evaluation measures (e.g. ROUGE). However, the reward function to be used within the reinforcement learning approach can play a key role for performance and is still partially unexplored. For this reason, in this paper, we propose two reward functions for the task of abstractive summarisation: the first function, referred to as RwB-Hinge, dynamically selects the samples for the gradient update. The second function, nicknamed RISK, leverages a small pool of strong candidates to inform the reward. In the experiments, we probe the proposed approach by fine-tuning an NLL pre-trained model over nine summarisation datasets of diverse size and nature. The experimental results show a consistent improvement over the negative log-likelihood baselines.

exploring reinforcement learning exploring reinforcement reinforcement learning rewards استكشاف التعزيز التعلم استكشاف التعزيز مكافآت التعزيز التعزيز صناعة حمض الفوسفور المزيد..

3D Numerical Analysis Of A Shallow Tunnel Face Reinforced By Longitudinal Pipes

1290 - Tishreen University 2018 ورقة بحثية

Supporting tunnel face by Fiber glass pipes technology is an effective method to maintain the stability of tunnel face, thus reducing surface settlements, face deformation and maintaining the safety of the workers and the mechanisms used in tunneli ng. This paper presents the results of finite-difference numerical analyses (FLAC3D program) on the behavior of a shallow tunnel face reinforced by longitudinal fiber glass pipes. A 3D numerical model has been calibrated and used to demonstrate the effectiveness of this technique, and perform a parametric study to determine the critical reinforcements parameters (the density (number of pipes)(A) and length (L)). The results indicate that the face reinforcement technique using longitudinal fiber glass pipes can significantly reduce the movements (face displacement and surface settlements), and thus improving the face stability. These movements decrease by increasing the length of the pipes and increasing the density of the pipes until reaching the critical density, and when we reach the critical length, the pipes must be renewed to maintain the stability of the tunneling process.

نفق استقرار جبهة الحفر التحليل بالفروقات المحدودة تسليح جبهة الحفر أنابيب ألياف زجاجية Tunneling Face stability Finite-difference analysis Face reinforcement Fiberglass pipes المزيد..

Developing an Optimal Financial Trading System Using Artificial Intelligent Techniques

1664 - Aِl-Baath University 2018 ورقة بحثية

In this paper, it has merged two techniques of the artificial intelligent, they are the ants colony optimization algorithm and the genetic algorithm, to The recurrent reinforcement learning trading system optimization. The proposed trading system is based on an ant colony optimization algorithm and the genetic algorithm to select an optimal group of technical indicators, and fundamental indicators.

GENETIC ALGORITHM الخوارزمية الجينية الذكاء الصنعي التعلم المُعزّز العودي خوارزمية أمثلية مستعمرة النمل Artificial Intelligent Recurrent reinforcement learning Ants colony optimization algorithm المزيد..

Numerical Analysis For Excavate Shallow Tunnel by T.B.M and its Impact on Settlement of the Soil Above the Tunnel

1448 - Aِl-Baath University 2017 ورقة بحثية

This paper presents and analyses the results of 2D and 3D numerical simulation conducted for the performance prediction of TBM tunneling in clay soil.

settlement نفق tunnel Lining سلوك التربة بطانة النفق طرق التدعيم هبوط التربة السطحي Soil behavior T.B.M Numerical Methods of reinforcement المزيد..

Optimization of the Linear Systems with Unknown Dynamics Using Intelligent Operations Research Techniques

1095 - Aِl-Baath University 2016 ورقة بحثية

This paper presents a method for finding online adaptive optimal controllers for continuous-time linear systems without knowing the system dynamical matrices. The proposed method employs one of Intelligent Operations Research Techniques, this tech nique is the adaptive dynamic programming, to iteratively solve the algebraic Riccati equation using the online information of state and input, without requiring the a priori knowledge of the system dynamics. In addition, all iterations can be conducted by using repeatedly the same state and input information on some fixed time intervals. A practical online algorithm is developed in this paper, and is applied to the controller design for a turbocharged diesel engine with exhaust gas recirculation.

التحكم الأمثل Linear systems بحوث العمليات الذكية البرمجة الديناميكية التكيفية التعلم المعزّز الأنظمة الخطية Intelligent Operations Research Adaptive dynamic programming Reinforcement learning Optimal control المزيد..

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد