بحث متقدم مدعوم من الذكاء الصنعي

مساحة جديدة

اشترك بالحزمة الذهبية واحصل على وصول غير محدود شمرا أكاديميا

تسجيل مستخدم جديد

Multialternative Neural Decision Processes

320 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Fabio Angelo Maccheroni

تاريخ النشر 2020

مجال البحث الهندسة المعلوماتية اقتصاد

والبحث باللغة English

تأليف Carlo Baldassi - Simone Cerreia-Vioglio - Fabio Maccheroni

الذكاء الاصطناعي الاقتصاد النظري الخلايا العصبية والإدراك

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

We introduce an algorithmic decision process for multialternative choice that combines binary comparisons and Markovian exploration. We show that a preferential property, transitivity, makes it testable.

قيم البحث

124 - Manfred Jaeger , Giorgio Bacci , Giovanni Bacci 2020

Euclidean Markov decision processes are a powerful tool for modeling control problems under uncertainty over continuous domains. Finite state imprecise, Markov decision processes can be used to approximate the behavior of these infinite models. In th is paper we address two questions: first, we investigate what kind of approximation guarantees are obtained when the Euclidean process is approximated by finite state approximations induced by increasingly fine partitions of the continuous state space. We show that for cost functions over finite time horizons the approximations become arbitrarily precise. Second, we use imprecise Markov decision process approximations as a tool to analyse and validate cost functions and strategies obtained by reinforcement learning. We find that, on the one hand, our new theoretical results validate basic design choices of a previously proposed reinforcement learning approach. On the other hand, the imprecise Markov decision process approximations reveal some inaccuracies in the learned cost functions.

الذكاء الاصطناعي

A Gauss-Newton Method for Markov Decision Processes

63 - Thomas Furmston , Guy Lever 2015

Approximate Newton methods are a standard optimization tool which aim to maintain the benefits of Newtons method, such as a fast rate of convergence, whilst alleviating its drawbacks, such as computationally expensive calculation or estimation of the inverse Hessian. In this work we investigate approximate Newton methods for policy optimization in Markov Decision Processes (MDPs). We first analyse the structure of the Hessian of the objective function for MDPs. We show that, like the gradient, the Hessian exhibits useful structure in the context of MDPs and we use this analysis to motivate two Gauss-Newton Methods for MDPs. Like the Gauss-Newton method for non-linear least squares, these methods involve approximating the Hessian by ignoring certain terms in the Hessian which are difficult to estimate. The approximate Hessians possess desirable properties, such as negative definiteness, and we demonstrate several important performance guarantees including guaranteed ascent directions, invariance to affine transformation of the parameter space, and convergence guarantees. We finally provide a unifying perspective of key policy search algorithms, demonstrating that our second Gauss-Newton algorithm is closely related to both the EM-algorithm and natural gradient ascent applied to MDPs, but performs significantly better in practice on a range of challenging domains.

الذكاء الاصطناعي التعلم الآلي التعلم الالي

Policy Iteration for Decentralized Control of Markov Decision Processes

411 - Daniel S. Bernstein , Christopher Amato , Eric A. Hansen 2014

Coordination of distributed agents is required for problems arising in many areas, including multi-robot systems, networking and e-commerce. As a formal framework for such problems, we use the decentralized partially observable Markov decision proces s (DEC-POMDP). Though much work has been done on optimal dynamic programming algorithms for the single-agent version of the problem, optimal algorithms for the multiagent case have been elusive. The main contribution of this paper is an optimal policy iteration algorithm for solving DEC-POMDPs. The algorithm uses stochastic finite-state controllers to represent policies. The solution can include a correlation device, which allows agents to correlate their actions without communicating. This approach alternates between expanding the controller and performing value-preserving transformations, which modify the controller without sacrificing value. We present two efficient value-preserving transformations: one can reduce the size of the controller and the other can improve its value while keeping the size fixed. Empirical results demonstrate the usefulness of value-preserving transformations in increasing value while keeping controller size to a minimum. To broaden the applicability of the approach, we also present a heuristic version of the policy iteration algorithm, which sacrifices convergence to optimality. This algorithm further reduces the size of the controllers at each step by assuming that probability distributions over the other agents actions are known. While this assumption may not hold in general, it helps produce higher quality solutions in our test problems.

الذكاء الاصطناعي

Deep Reinforcement Learning for Event-Driven Multi-Agent Decision Processes

115 - Kunal Menda , Yi-Chun Chen , Justin Grana 2017

The incorporation of macro-actions (temporally extended actions) into multi-agent decision problems has the potential to address the curse of dimensionality associated with such decision problems. Since macro-actions last for stochastic durations, mu ltiple agents executing decentralized policies in cooperative environments must act asynchronously. We present an algorithm that modifies generalized advantage estimation for temporally extended actions, allowing a state-of-the-art policy optimization algorithm to optimize policies in Dec-POMDPs in which agents act asynchronously. We show that our algorithm is capable of learning optimal policies in two cooperative domains, one involving real-time bus holding control and one involving wildfire fighting with unmanned aircraft. Our algorithm works by framing problems as event-driven decision processes, which are scenarios in which the sequence and timing of actions and events are random and governed by an underlying stochastic process. In addition to optimizing policies with continuous state and action spaces, our algorithm also facilitates the use of event-driven simulators, which do not require time to be discretized into time-steps. We demonstrate the benefit of using event-driven simulation in the context of multiple agents taking asynchronous actions. We show that fixed time-step simulation risks obfuscating the sequence in which closely separated events occur, adversely affecting the policies learned. In addition, we show that arbitrarily shrinking the time-step scales poorly with the number of agents.

الذكاء الاصطناعي أنظمة متعددة العملاء

Modelling Cyber-Security Experts Decision Making Processes using Aggregation Operators

155 - Simon Miller , Christian Wagner , Uwe Aickelin 2016

An important role carried out by cyber-security experts is the assessment of proposed computer systems, during their design stage. This task is fraught with difficulties and uncertainty, making the knowledge provided by human experts essential for su ccessful assessment. Today, the increasing number of progressively complex systems has led to an urgent need to produce tools that support the expert-led process of system-security assessment. In this research, we use weighted averages (WAs) and ordered weighted averages (OWAs) with evolutionary algorithms (EAs) to create aggregation operators that model parts of the assessment process. We show how individual overall ratings for security components can be produced from ratings of their characteristics, and how these individual overall ratings can be aggregated to produce overall rankings of potential attacks on a system. As well as the identification of salient attacks and weak points in a prospective system, the proposed method also highlights which factors and security components contribute most to a components difficulty and attack ranking respectively. A real world scenario is used in which experts were asked to rank a set of technical attacks, and to answer a series of questions about the security components that are the subject of the attacks. The work shows how finding good aggregation operators, and identifying important components and factors of a cyber-security problem can be automated. The resulting operators have the potential for use as decision aids for systems designers and cyber-security experts, increasing the amount of assessment that can be achieved with the limited resources available.

الذكاء الاصطناعي التشفير والأمن

الأسئلة المقترحة

ما العلاقة بين الذكاء الاصطناعي وتعلم الآلة؟

1994 - 0 - - Shamra Editor تم طرحه بمساحة ( الهندسة المعلوماتية)

الذكاء الاصطناعي

كيف يمكن استخدام الذكاء الصنعي والتعلم الآلي في التعليم؟

1642 - 0 - - Shamra Editor تم طرحه بمساحة ( الهندسة المعلوماتية)

الذكاء الاصطناعي

ماذا يعني التنقيب عن البيانات؟

2362 - 0 - - Ahmad Ali تم طرحه بمساحة ( الهندسة المعلوماتية)

الذكاء الاصطناعي

سجل دخول لتتمكن من نشر تعليقات

التعليقات

جاري جلب التعليقات

سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها

الجامعة المستنصرية

تفاصيل إضافية المزيد من الجامعات

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Multialternative Neural Decision Processes

اسأل ChatGPT حول البحث

ﻻ يوجد ملخص باللغة العربية

We introduce an algorithmic decision process for multialternative choice that combines binary comparisons and Markovian exploration. We show that a preferential property, transitivity, makes it testable.

اقرأ أيضاً

الأسئلة المقترحة