Do you want to publish a course? Click here

Deep reinforcement learning (RL) methods often require many trials before convergence, and no direct interpretability of trained policies is provided. In order to achieve fast convergence and interpretability for the policy in RL, we propose a novel RL method for text-based games with a recent neuro-symbolic framework called Logical Neural Network, which can learn symbolic and interpretable rules in their differentiable network. The method is first to extract first-order logical facts from text observation and external word meaning network (ConceptNet), then train a policy in the network with directly interpretable logical operators. Our experimental results show RL training with the proposed method converges significantly faster than other state-of-the-art neuro-symbolic methods in a TextWorld benchmark.
Most reinforcement learning methods for dialog policy learning train a centralized agent that selects a predefined joint action concatenating domain name, intent type, and slot name. The centralized dialog agent suffers from a great many user-agent i nteraction requirements due to the large action space. Besides, designing the concatenated actions is laborious to engineers and maybe struggled with edge cases. To solve these problems, we model the dialog policy learning problem with a novel multi-agent framework, in which each part of the action is led by a different agent. The framework reduces labor costs for action templates and decreases the size of the action space for each agent. Furthermore, we relieve the non-stationary problem caused by the changing dynamics of the environment as evolving of agents' policies by introducing a joint optimization process that makes agents can exchange their policy information. Concurrently, an independent experience replay buffer mechanism is integrated to reduce the dependence between gradients of samples to improve training efficiency. The effectiveness of the proposed framework is demonstrated in a multi-domain environment with both user simulator evaluation and human evaluation.
Deep reinforcement learning provides a promising approach for text-based games in studying natural language communication between humans and artificial agents. However, the generalization still remains a big challenge as the agents depend critically on the complexity and variety of training tasks. In this paper, we address this problem by introducing a hierarchical framework built upon the knowledge graph-based RL agent. In the high level, a meta-policy is executed to decompose the whole game into a set of subtasks specified by textual goals, and select one of them based on the KG. Then a sub-policy in the low level is executed to conduct goal-conditioned reinforcement learning. We carry out experiments on games with various difficulty levels and show that the proposed method enjoys favorable generalizability.
Low-resource Relation Extraction (LRE) aims to extract relation facts from limited labeled corpora when human annotation is scarce. Existing works either utilize self-training scheme to generate pseudo labels that will cause the gradual drift problem , or leverage meta-learning scheme which does not solicit feedback explicitly. To alleviate selection bias due to the lack of feedback loops in existing LRE learning paradigms, we developed a Gradient Imitation Reinforcement Learning method to encourage pseudo label data to imitate the gradient descent direction on labeled data and bootstrap its optimization capability through trial and error. We also propose a framework called GradLRE, which handles two major scenarios in low-resource relation extraction. Besides the scenario where unlabeled data is sufficient, GradLRE handles the situation where no unlabeled data is available, by exploiting a contextualized augmentation method to generate data. Experimental results on two public datasets demonstrate the effectiveness of GradLRE on low resource relation extraction when comparing with baselines.
Large volumes of interaction logs can be collected from NLP systems that are deployed in the real world. How can this wealth of information be leveraged? Using such interaction logs in an offline reinforcement learning (RL) setting is a promising app roach. However, due to the nature of NLP tasks and the constraints of production systems, a series of challenges arise. We present a concise overview of these challenges and discuss possible solutions.
To date, most abstractive summarisation models have relied on variants of the negative log-likelihood (NLL) as their training objective. In some cases, reinforcement learning has been added to train the models with an objective that is closer to thei r evaluation measures (e.g. ROUGE). However, the reward function to be used within the reinforcement learning approach can play a key role for performance and is still partially unexplored. For this reason, in this paper, we propose two reward functions for the task of abstractive summarisation: the first function, referred to as RwB-Hinge, dynamically selects the samples for the gradient update. The second function, nicknamed RISK, leverages a small pool of strong candidates to inform the reward. In the experiments, we probe the proposed approach by fine-tuning an NLL pre-trained model over nine summarisation datasets of diverse size and nature. The experimental results show a consistent improvement over the negative log-likelihood baselines.
Supporting tunnel face by Fiber glass pipes technology is an effective method to maintain the stability of tunnel face, thus reducing surface settlements, face deformation and maintaining the safety of the workers and the mechanisms used in tunneli ng. This paper presents the results of finite-difference numerical analyses (FLAC3D program) on the behavior of a shallow tunnel face reinforced by longitudinal fiber glass pipes. A 3D numerical model has been calibrated and used to demonstrate the effectiveness of this technique, and perform a parametric study to determine the critical reinforcements parameters (the density (number of pipes)(A) and length (L)). The results indicate that the face reinforcement technique using longitudinal fiber glass pipes can significantly reduce the movements (face displacement and surface settlements), and thus improving the face stability. These movements decrease by increasing the length of the pipes and increasing the density of the pipes until reaching the critical density, and when we reach the critical length, the pipes must be renewed to maintain the stability of the tunneling process.
In this paper, it has merged two techniques of the artificial intelligent, they are the ants colony optimization algorithm and the genetic algorithm, to The recurrent reinforcement learning trading system optimization. The proposed trading system is based on an ant colony optimization algorithm and the genetic algorithm to select an optimal group of technical indicators, and fundamental indicators.
This paper presents a method for finding online adaptive optimal controllers for continuous-time linear systems without knowing the system dynamical matrices. The proposed method employs one of Intelligent Operations Research Techniques, this tech nique is the adaptive dynamic programming, to iteratively solve the algebraic Riccati equation using the online information of state and input, without requiring the a priori knowledge of the system dynamics. In addition, all iterations can be conducted by using repeatedly the same state and input information on some fixed time intervals. A practical online algorithm is developed in this paper, and is applied to the controller design for a turbocharged diesel engine with exhaust gas recirculation.
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا