ﻻ يوجد ملخص باللغة العربية
Which variables does an agent have an incentive to control with its decision, and which variables does it have an incentive to respond to? We formalise these incentives, and demonstrate unique graphical criteria for detecting them in any single decision causal influence diagram. To this end, we introduce structural causal influence models, a hybrid of the influence diagram and structural causal model frameworks. Finally, we illustrate how these incentives predict agent incentives in both fairness and AI safety applications.
We present a framework for analysing agent incentives using causal influence diagrams. We establish that a well-known criterion for value of information is complete. We propose a new graphical criterion for value of control, establishing its soundnes
Agents are systems that optimize an objective function in an environment. Together, the goal and the environment induce secondary objectives, incentives. Modeling the agent-environment interaction using causal influence diagrams, we can answer two fu
The ability to use symbols is the pinnacle of human intelligence, but has yet to be fully replicated in machines. Here we argue that the path towards symbolically fluent artificial intelligence (AI) begins with a reinterpretation of what symbols are,
How should we decide which fairness criteria or definitions to adopt in machine learning systems? To answer this question, we must study the fairness preferences of actual users of machine learning systems. Stringent parity constraints on treatment o
By studying the underlying policies of decision-making agents, we can learn about their shortcomings and potentially improve them. Traditionally, this has been done either by examining the agents implementation, its behaviour while it is being execut