A Deep Reinforcement Learning Approach to First-Order Logic Theorem Proving

80 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Ibrahim Abdelaziz

تاريخ النشر 2019

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Maxwell Crouse - Ibrahim Abdelaziz - Bassem Makni

الذكاء الاصطناعي التعلم الآلي المنطق في علوم الحاسوب

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Automated theorem provers have traditionally relied on manually tuned heuristics to guide how they perform proof search. Deep reinforcement learning has been proposed as a way to obviate the need for such heuristics, however, its deployment in automated theorem proving remains a challenge. In this paper we introduce TRAIL, a system that applies deep reinforcement learning to saturation-based theorem proving. TRAIL leverages (a) a novel neural representation of the state of a theorem prover and (b) a novel characterization of the inference selection process in terms of an attention-based action policy. We show through systematic analysis that these mechanisms allow TRAIL to significantly outperform previous reinforcement-learning-based theorem provers on two benchmark datasets for first-order logic automated theorem proving (proving around 15% more theorems).

قيم البحث

155 - Ibrahim Abdelaziz , Veronika Thost , Maxwell Crouse 2020

Automated theorem proving in first-order logic is an active research area which is successfully supported by machine learning. While there have been various proposals for encoding logical formulas into numerical vectors -- from simple strings to more involved graph-based embeddings -- little is known about how these different encodings compare. In this paper, we study and experimentally compare pattern-based embeddings that are applied in current systems with popular graph-based encodings, most of which have not been considered in the theorem proving context before. Our experiments show that the advantages of simpler encoding schemes in terms of runtime are outdone by more complex graph-based embeddings, which yield more efficient search strategies and simpler proofs. To support this, we present a detailed analysis across several dimensions of theorem prover performance beyond just proof completion rate, thus providing empirical evidence to help guide future research on neural-guided theorem proving towards the most promising directions.

الذكاء الاصطناعي التعلم الآلي المنطق في علوم الحاسوب

Implicitly Learning to Reason in First-Order Logic

157 - Vaishak Belle , Brendan Juba 2019

We consider the problem of answering queries about formulas of first-order logic based on background knowledge partially represented explicitly as other formulas, and partially represented as examples independently drawn from a fixed probability dist ribution. PAC semantics, introduced by Valiant, is one rigorous, general proposal for learning to reason in formal languages: although weaker than classical entailment, it allows for a powerful model theoretic framework for answering queries while requiring minimal assumptions about the form of the distribution in question. To date, however, the most significant limitation of that approach, and more generally most machine learning approaches with robustness guarantees, is that the logical language is ultimately essentially propositional, with finitely many atoms. Indeed, the theoretical findings on the learning of relational theories in such generality have been resoundingly negative. This is despite the fact that first-order logic is widely argued to be most appropriate for representing human knowledge. In this work, we present a new theoretical approach to robustly learning to reason in first-order logic, and consider universally quantified clauses over a countably infinite domain. Our results exploit symmetries exhibited by constants in the language, and generalize the notion of implicit learnability to show how queries can be computed against (implicitly) learned first-order background knowledge.

الذكاء الاصطناعي

GamePad: A Learning Environment for Theorem Proving

73 - Daniel Huang , Prafulla Dhariwal , Dawn Song 2018

In this paper, we introduce a system called GamePad that can be used to explore the application of machine learning methods to theorem proving in the Coq proof assistant. Interactive theorem provers such as Coq enable users to construct machine-check able proofs in a step-by-step manner. Hence, they provide an opportunity to explore theorem proving with human supervision. We use GamePad to synthesize proofs for a simple algebraic rewrite problem and train baseline models for a formalization of the Feit-Thompson theorem. We address position evaluation (i.e., predict the number of proof steps left) and tactic prediction (i.e., predict the next proof step) tasks, which arise naturally in tactic-based theorem proving.

التعلم الآلي الذكاء الاصطناعي المنطق في علوم الحاسوب

A New Approach for Resource Scheduling with Deep Reinforcement Learning

128 - Yufei Ye , Xiaoqin Ren , Jin Wang 2018

With the rapid development of deep learning, deep reinforcement learning (DRL) began to appear in the field of resource scheduling in recent years. Based on the previous research on DRL in the literature, we introduce online resource scheduling algor ithm DeepRM2 and the offline resource scheduling algorithm DeepRM_Off. Compared with the state-of-the-art DRL algorithm DeepRM and heuristic algorithms, our proposed algorithms have faster convergence speed and better scheduling efficiency with regarding to average slowdown time, job completion time and rewards.

الذكاء الاصطناعي التعلم الآلي

Joint Resource Management for MC-NOMA: A Deep Reinforcement Learning Approach

353 - Shaoyang Wang , Tiejun Lv , Wei Ni 2021

This paper presents a novel and effective deep reinforcement learning (DRL)-based approach to addressing joint resource management (JRM) in a practical multi-carrier non-orthogonal multiple access (MC-NOMA) system, where hardware sensitivity and impe rfect successive interference cancellation (SIC) are considered. We first formulate the JRM problem to maximize the weighted-sum system throughput. Then, the JRM problem is decoupled into two iterative subtasks: subcarrier assignment (SA, including user grouping) and power allocation (PA). Each subtask is a sequential decision process. Invoking a deep deterministic policy gradient algorithm, our proposed DRL-based JRM (DRL-JRM) approach jointly performs the two subtasks, where the optimization objective and constraints of the subtasks are addressed by a new joint reward and internal reward mechanism. A multi-agent structure and a convolutional neural network are adopted to reduce the complexity of the PA subtask. We also tailor the neural network structure for the stability and convergence of DRL-JRM. Corroborated by extensive experiments, the proposed DRL-JRM scheme is superior to existing alternatives in terms of system throughput and resistance to interference, especially in the presence of many users and strong inter-cell interference. DRL-JRM can flexibly meet individual service requirements of users.

الذكاء الاصطناعي التعلم الآلي