Policies for the Dynamic Traveling Maintainer Problem with Alerts

49 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Paulo Roberto de Oliveira da Costa

تاريخ النشر 2021

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Paulo da Costa - Peter Verleijsdonk - Simon Voorberg

التحسين والتحكم التعلم الآلي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Companies require modern capital assets such as wind turbines, trains and hospital equipment to experience minimal downtime. Ideally, assets are maintained right before failure to ensure maximum availability at minimum maintenance costs. To this end, two challenges arise: failure times of assets are unknown a priori and assets can be part of a larger asset network. Nowadays, it is common for assets to be equipped with real-time monitoring that emits alerts, typically triggered by the first signs of degradation. Thus, it becomes crucial to plan maintenance considering information received via alerts, asset locations and maintenance costs. This problem is referred to as the Dynamic Traveling Maintainer Problem with Alerts (DTMPA). We propose a modeling framework for the DTMPA, where the alerts are early and imperfect indicators of failures. The objective is to minimize discounted maintenance costs accrued over an infinite time horizon. We propose three methods to solve this problem, leveraging different information levels from the alert signals. The proposed methods comprise various greedy heuristics that rank assets based on proximity, urgency and economic risk; a Traveling Maintainer Heuristic employing combinatorial optimization to optimize near-future costs; a Deep Reinforcement Learning (DRL) method trained to minimize the long-term costs using exclusively the history of alerts. In a simulated environment, all methods can approximate optimal policies with access to perfect condition information for small asset networks. For larger networks, where computing the optimal policy is intractable, the proposed methods yield competitive maintenance policies, with DRL consistently achieving the lowest costs.

قيم البحث

94 - Eugene A. Feinberg , Gaojin He 2020

This note provides upper bounds on the number of operations required to compute by value iterations a nearly optimal policy for an infinite-horizon discounted Markov decision process with a finite number of states and actions. For a given discount fa ctor, magnitude of the reward function, and desired closeness to optimality, these upper bounds are strongly polynomial in the number of state-action pairs, and one of the provided upper bounds has the property that it is a non-decreasing function of the value of the discount factor.

التحسين والتحكم

Learning Convex Optimization Control Policies

161 - Akshay Agrawal , Shane Barratt , Stephen Boyd 2019

Many control policies used in various applications determine the input or action by solving a convex optimization problem that depends on the current state and some parameters. Common examples of such convex optimization control policies (COCPs) incl ude the linear quadratic regulator (LQR), convex model predictive control (MPC), and convex control-Lyapunov or approximate dynamic programming (ADP) policies. These types of control policies are tuned by varying the parameters in the optimization problem, such as the LQR weights, to obtain good performance, judged by application-specific metrics. Tuning is often done by hand, or by simple methods such as a crude grid search. In this paper we propose a method to automate this process, by adjusting the parameters using an approximate gradient of the performance metric with respect to the parameters. Our method relies on recently developed methods that can efficiently evaluate the derivative of the solution of a convex optimization problem with respect to its parameters. We illustrate our method on several examples.

التحسين والتحكم التعلم الآلي

A recursion-free functional approximation for the dynamic inventory problem

57 - Onur A. Kilic , S. Armagan Tarim 2020

We consider the dynamic inventory problem with non-stationary demands. It has long been known that non-stationary (s, S) policies are optimal for this problem. However, finding optimal policy parameters remains a computational challenge as it require s solving a large-scale stochastic dynamic program. To address this, we devise a recursion-free approximation for the optimal cost function of the problem. This enables us to compute policy parameters heuristically, without resorting to a stochastic dynamic program. The heuristic is easy-to-understand and -use since it follows by elementary methods of convex minimization and shortest paths, yet it is very effective and outperforms earlier heuristics.

التحسين والتحكم

Dynamic Programming and Linear Programming for Odds Problem

208 - Sachika Kurokawa , Tomomi Matsui 2021

This paper discusses the odds problem, proposed by Bruss in 2000, and its variants. A recurrence relation called a dynamic programming (DP) equation is used to find an optimal stopping policy of the odds problem and its variants. In 2013, Buchbinder, Jain, and Singh proposed a linear programming (LP) formulation for finding an optimal stopping policy of the classical secretary problem, which is a special case of the odds problem. The proposed linear programming problem, which maximizes the probability of a win, differs from the DP equations known for long time periods. This paper shows that an ordinary DP equation is a modification of the dual problem of linear programming including the LP formulation proposed by Buchbinder, Jain, and Singh.

التحسين والتحكم تطبيقات الإحصاء

The Traveling Firefighter Problem

53 - Majid Farhadi , Alejandro Toriello , Prasad Tetali 2021

We introduce the $L_p$ Traveling Salesman Problem ($L_p$-TSP), given by an origin, a set of destinations, and underlying distances. The objective is to schedule a destination visit sequence for a traveler of unit speed to minimize the Minkowski $p$-n orm of the resulting vector of visit/service times. For $p = infty$ the problem becomes a path variant of the TSP, and for $p = 1$ it defines the Traveling Repairman Problem (TRP), both at the center of classical combinatorial optimization. We provide an approximation preserving polynomial-time reduction of $L_p$-TSP to the segmented-TSP Problem [Sitters 14] and further study the case of $p = 2$, which we term the Traveling Firefighter Problem (TFP), when the cost due to a delay in service is quadratic in time. We also study the all-norm-TSP problem [Golovin et al. 08], in which the objective is to find a route that is (approximately) optimal with respect to the minimization of any norm of the visit times, and improve corresponding (in)approximability bounds on metric spaces.

بنى وهياكل البيانات والخوارزميات