ترغب بنشر مسار تعليمي؟ اضغط هنا

Simple Strategies in Multi-Objective MDPs (Technical Report)

188   0   0.0 ( 0 )
 نشر من قبل Tim Quatmann
 تاريخ النشر 2019
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

We consider the verification of multiple expected reward objectives at once on Markov decision processes (MDPs). This enables a trade-off analysis among multiple objectives by obtaining the Pareto front. We focus on strategies that are easy to employ and implement. That is, strategies that are pure (no randomization) and have bounded memory. We show that checking whether a point is achievable by a pure stationary strategy is NP-complete, even for two objectives, and we provide an MILP encoding to solve the corresponding problem. The bounded memory case can be reduced to the stationary one by a product construction. Experimental results using Storm and Gurobi show the feasibility of our algorithms.



قيم البحث

اقرأ أيضاً

We propose a new global SPACING constraint that is useful in modeling events that are distributed over time, like learning units scheduled over a study program or repeated patterns in music compositions. First, we investigate theoretical properties o f the constraint and identify tractable special cases. We propose efficient DC filtering algorithms for these cases. Then, we experimentally evaluate performance of the proposed algorithms on a music composition problem and demonstrate that our filtering algorithms outperform the state-of-the-art approach for solving this problem.
71 - Chao Xu , Yiping Xie , Xijun Wang 2021
Recently, we have struck the balance between the information freshness, in terms of age of information (AoI), experienced by users and energy consumed by sensors, by appropriately activating sensors to update their current status in caching enabled I nternet of Things (IoT) networks [1]. To solve this problem, we cast the corresponding status update procedure as a continuing Markov Decision Process (MDP) (i.e., without termination states), where the number of state-action pairs increases exponentially with respect to the number of considered sensors and users. Moreover, to circumvent the curse of dimensionality, we have established a methodology for designing deep reinforcement learning (DRL) algorithms to maximize (resp. minimize) the average reward (resp. cost), by integrating R-learning, a tabular reinforcement learning (RL) algorithm tailored for maximizing the long-term average reward, and traditional DRL algorithms, initially developed to optimize the discounted long-term cumulative reward rather than the average one. In this technical report, we would present detailed discussions on the technical contributions of this methodology.
This paper provides several optimizations of the rank-based approach for complementing B{u}chi automata. We start with Schewes theoretically optimal construction and develop a set of techniques for pruning its state space that are key to obtaining sm all complement automata in practice. In particular, the reductions (except one) have the property that they preserve (at least some) so-called super-tight runs, which are runs whose ranking is as tight as possible. Our evaluation on a large benchmark shows that the optimizations indeed significantly help the rank-based approach and that, in a large number of cases, the obtained complement is the smallest from those produced by a large number of state-of-the-art tools for B{u}chi complementation.
Meta learning with multiple objectives can be formulated as a Multi-Objective Bi-Level optimization Problem (MOBLP) where the upper-level subproblem is to solve several possible conflicting targets for the meta learner. However, existing studies eith er apply an inefficient evolutionary algorithm or linearly combine multiple objectives as a single-objective problem with the need to tune combination weights. In this paper, we propose a unified gradient-based Multi-Objective Meta Learning (MOML) framework and devise the first gradient-based optimization algorithm to solve the MOBLP by alternatively solving the lower-level and upper-level subproblems via the gradient descent method and the gradient-based multi-objective optimization method, respectively. Theoretically, we prove the convergence properties of the proposed gradient-based optimization algorithm. Empirically, we show the effectiveness of the proposed MOML framework in several meta learning problems, including few-shot learning, neural architecture search, domain adaptation, and multi-task learning.
189 - Bart Bogaerts 2019
Since the first conference held in Marseille in 1982, ICLP has been the premier international event for presenting research in logic programming. Contributions are sought in all areas of logic programming, including but not restricted to: Foundatio ns: Semantics, Formalisms, Nonmonotonic reasoning, Knowledge representation. Languages: Concurrency, Objects, Coordination, Mobility, Higher Order, Types, Modes, Assertions, Modules, Meta-programming, Logic-based domain-specific languages, Programming Techniques. Declarative programming: Declarative program development, Analysis, Type and mode inference, Partial evaluation, Abstract interpretation, Transformation, Validation, Verification, Debugging, Profiling, Testing, Execution visualization Implementation: Virtual machines, Compilation, Memory management, Parallel/distributed execution, Constraint handling rules, Tabling, Foreign interfaces, User interfaces. Related Paradigms and Synergies: Inductive and Co-inductive Logic Programming, Constraint Logic Programming, Answer Set Programming, Interaction with SAT, SMT and CSP solvers, Logic programming techniques for type inference and theorem proving, Argumentation, Probabilistic Logic Programming, Relations to object-oriented and Functional programming. Applications: Databases, Big Data, Data integration and federation, Software engineering, Natural language processing, Web and Semantic Web, Agents, Artificial intelligence, Computational life sciences, Education, Cybersecurity, and Robotics.

الأسئلة المقترحة

التعليقات
جاري جلب التعليقات جاري جلب التعليقات
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا