Implementing Access Control Markov Decision Processes with GLPK/GMPL

144 0 0.0 ( 0 )

Download Cite

Added by Charles Morisset

Publication date 2012

fields Informatics Engineering

and research's language is English

Authors Charles Morisset

Cryptography and Security

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

In a recent approach, we proposed to model an access control mechanism as a Markov Decision Process, thus claiming that in order to make an access control decision, one can use well-defined mechanisms from decision theory. We present in this paper an implementation of such mechanism, using the open-source solver GLPK, and we model the problem in the GMPL language. We illustrate our approach with a simple, yet expressive example, and we show how the variation of some parameters can change the final outcome. In particular, we show that in addition to returning a decision, we can also calculate the value of each decision.

rate research

Impulsive control for continuous-time Markov Decision Processes

188 - Franc{c}ois Dufour , Alexei Piunovskiy 2014

The objective of this work is to study continuous-time Markov decision processes on a general Borel state space with both impulsive and continuous controls for the infinite-time horizon discounted cost. The continuous-time controlled process is shown to be non explosive under appropriate hypotheses. The so-called Bellman equation associated to this control problem is studied. Sufficient conditions ensuring the existence and the uniqueness of a bounded measurable solution to this optimality equation are provided. Moreover, it is shown that the value function of the optimization problem under consideration satisfies this optimality equation. Sufficient conditions are also presented to ensure on one hand the existence of an optimal control strategy and on the other hand the existence of an $varepsilon$-optimal control strategy. The decomposition of the state space in two disjoint subsets is exhibited where roughly speaking, one should apply a gradual action or an impulsive action correspondingly to get an optimal or $varepsilon$-optimal strategy. An interesting consequence of our previous results is as follows: the set of strategies that allow interventions at time $t=0$ and only immediately after natural jumps is a sufficient set for the control problem under consideration.

Optimization and Control

Policy Iteration for Decentralized Control of Markov Decision Processes

169 - Daniel S. Bernstein , Christopher Amato , Eric A. Hansen 2014

Coordination of distributed agents is required for problems arising in many areas, including multi-robot systems, networking and e-commerce. As a formal framework for such problems, we use the decentralized partially observable Markov decision process (DEC-POMDP). Though much work has been done on optimal dynamic programming algorithms for the single-agent version of the problem, optimal algorithms for the multiagent case have been elusive. The main contribution of this paper is an optimal policy iteration algorithm for solving DEC-POMDPs. The algorithm uses stochastic finite-state controllers to represent policies. The solution can include a correlation device, which allows agents to correlate their actions without communicating. This approach alternates between expanding the controller and performing value-preserving transformations, which modify the controller without sacrificing value. We present two efficient value-preserving transformations: one can reduce the size of the controller and the other can improve its value while keeping the size fixed. Empirical results demonstrate the usefulness of value-preserving transformations in increasing value while keeping controller size to a minimum. To broaden the applicability of the approach, we also present a heuristic version of the policy iteration algorithm, which sacrifices convergence to optimality. This algorithm further reduces the size of the controllers at each step by assuming that probability distributions over the other agents actions are known. While this assumption may not hold in general, it helps produce higher quality solutions in our test problems.

Artificial Intelligence

Function-Based Access Control (FBAC): From Access Control Matrix to Access Control Tensor

82 - Yvo Desmedt , Arash Shaghaghi 2016

Security researchers have stated that the core concept behind current implementations of access control predates the Internet. These assertions are made to pinpoint that there is a foundational gap in this field, and one should consider revisiting the concepts from the ground up. Moreover, Insider threats, which are an increasing threat vector against organizations are also associated with the failure of access control. Access control models derived from access control matrix encompass three sets of entities, Subjects, Objects and Operations. Typically, objects are considered to be files and operations are regarded as Read, Write, and Execute. This implies an `open sesame approach when granting access to data, i.e. once access is granted, there is no restriction on command executions. Inspired by Functional Encryption, we propose applying access authorizations at a much finer granularity, but instead of an ad-hoc or computationally hard cryptographic approach, we postulate a foundational transformation to access control. From an abstract viewpoint, we suggest storing access authorizations as a three-dimensional tensor, which we call Access Control Tensor (ACT). In Function-based Access Control (FBAC), applications do not give blind folded execution right and can only invoke commands that have been authorized for data segments. In other words, one might be authorized to use a certain command on one object, while being forbidden to use exactly the same command on another object. The theoretical foundations of FBAC are presented along with Policy, Enforcement and Implementation (PEI) requirements of it. A critical analysis of the advantages of deploying FBAC, how it will result in developing a new generation of applications, and compatibility with existing models and systems is also included. Finally, a proof of concept implementation of FBAC is presented.

Cryptography and Security

Online Markov Decision Processes with Aggregate Bandit Feedback

358 - Alon Cohen , Haim Kaplan , Tomer Koren 2021

We study a novel variant of online finite-horizon Markov Decision Processes with adversarially changing loss functions and initially unknown dynamics. In each episode, the learner suffers the loss accumulated along the trajectory realized by the policy chosen for the episode, and observes aggregate bandit feedback: the trajectory is revealed along with the cumulative loss suffered, rather than the individual losses encountered along the trajectory. Our main result is a computationally efficient algorithm with $O(sqrt{K})$ regret for this setting, where $K$ is the number of episodes. We establish this result via an efficient reduction to a novel bandit learning setting we call Distorted Linear Bandits (DLB), which is a variant of bandit linear optimization where actions chosen by the learner are adversarially distorted before they are committed. We then develop a computationally-efficient online algorithm for DLB for which we prove an $O(sqrt{T})$ regret bound, where $T$ is the number of time steps. Our algorithm is based on online mirror descent with a self-concordant barrier regularization that employs a novel increasing learning rate schedule.

Machine Learning Machine Learning

Markov Decision Processes with Long-Term Average Constraints

104 - Mridul Agarwal , Qinbo Bai , 2021

We consider the problem of constrained Markov Decision Process (CMDP) where an agent interacts with a unichain Markov Decision Process. At every interaction, the agent obtains a reward. Further, there are $K$ cost functions. The agent aims to maximize the long-term average reward while simultaneously keeping the $K$ long-term average costs lower than a certain threshold. In this paper, we propose CMDP-PSRL, a posterior sampling based algorithm using which the agent can learn optimal policies to interact with the CMDP. Further, for MDP with $S$ states, $A$ actions, and diameter $D$, we prove that following CMDP-PSRL algorithm, the agent can bound the regret of not accumulating rewards from optimal policy by $Tilde{O}(poly(DSA)sqrt{T})$. Further, we show that the violations for any of the $K$ constraints is also bounded by $Tilde{O}(poly(DSA)sqrt{T})$. To the best of our knowledge, this is the first work which obtains a $Tilde{O}(sqrt{T})$ regret bounds for ergodic MDPs with long-term average constraints.

Machine Learning Artificial Intelligence Systems and Control