No Arabic abstract
Companies require modern capital assets such as wind turbines, trains and hospital equipment to experience minimal downtime. Ideally, assets are maintained right before failure to ensure maximum availability at minimum maintenance costs. To this end, two challenges arise: failure times of assets are unknown a priori and assets can be part of a larger asset network. Nowadays, it is common for assets to be equipped with real-time monitoring that emits alerts, typically triggered by the first signs of degradation. Thus, it becomes crucial to plan maintenance considering information received via alerts, asset locations and maintenance costs. This problem is referred to as the Dynamic Traveling Maintainer Problem with Alerts (DTMPA). We propose a modeling framework for the DTMPA, where the alerts are early and imperfect indicators of failures. The objective is to minimize discounted maintenance costs accrued over an infinite time horizon. We propose three methods to solve this problem, leveraging different information levels from the alert signals. The proposed methods comprise various greedy heuristics that rank assets based on proximity, urgency and economic risk; a Traveling Maintainer Heuristic employing combinatorial optimization to optimize near-future costs; a Deep Reinforcement Learning (DRL) method trained to minimize the long-term costs using exclusively the history of alerts. In a simulated environment, all methods can approximate optimal policies with access to perfect condition information for small asset networks. For larger networks, where computing the optimal policy is intractable, the proposed methods yield competitive maintenance policies, with DRL consistently achieving the lowest costs.
This note provides upper bounds on the number of operations required to compute by value iterations a nearly optimal policy for an infinite-horizon discounted Markov decision process with a finite number of states and actions. For a given discount factor, magnitude of the reward function, and desired closeness to optimality, these upper bounds are strongly polynomial in the number of state-action pairs, and one of the provided upper bounds has the property that it is a non-decreasing function of the value of the discount factor.
Many control policies used in various applications determine the input or action by solving a convex optimization problem that depends on the current state and some parameters. Common examples of such convex optimization control policies (COCPs) include the linear quadratic regulator (LQR), convex model predictive control (MPC), and convex control-Lyapunov or approximate dynamic programming (ADP) policies. These types of control policies are tuned by varying the parameters in the optimization problem, such as the LQR weights, to obtain good performance, judged by application-specific metrics. Tuning is often done by hand, or by simple methods such as a crude grid search. In this paper we propose a method to automate this process, by adjusting the parameters using an approximate gradient of the performance metric with respect to the parameters. Our method relies on recently developed methods that can efficiently evaluate the derivative of the solution of a convex optimization problem with respect to its parameters. We illustrate our method on several examples.
We consider the dynamic inventory problem with non-stationary demands. It has long been known that non-stationary (s, S) policies are optimal for this problem. However, finding optimal policy parameters remains a computational challenge as it requires solving a large-scale stochastic dynamic program. To address this, we devise a recursion-free approximation for the optimal cost function of the problem. This enables us to compute policy parameters heuristically, without resorting to a stochastic dynamic program. The heuristic is easy-to-understand and -use since it follows by elementary methods of convex minimization and shortest paths, yet it is very effective and outperforms earlier heuristics.
This paper discusses the odds problem, proposed by Bruss in 2000, and its variants. A recurrence relation called a dynamic programming (DP) equation is used to find an optimal stopping policy of the odds problem and its variants. In 2013, Buchbinder, Jain, and Singh proposed a linear programming (LP) formulation for finding an optimal stopping policy of the classical secretary problem, which is a special case of the odds problem. The proposed linear programming problem, which maximizes the probability of a win, differs from the DP equations known for long time periods. This paper shows that an ordinary DP equation is a modification of the dual problem of linear programming including the LP formulation proposed by Buchbinder, Jain, and Singh.
We introduce the $L_p$ Traveling Salesman Problem ($L_p$-TSP), given by an origin, a set of destinations, and underlying distances. The objective is to schedule a destination visit sequence for a traveler of unit speed to minimize the Minkowski $p$-norm of the resulting vector of visit/service times. For $p = infty$ the problem becomes a path variant of the TSP, and for $p = 1$ it defines the Traveling Repairman Problem (TRP), both at the center of classical combinatorial optimization. We provide an approximation preserving polynomial-time reduction of $L_p$-TSP to the segmented-TSP Problem [Sitters 14] and further study the case of $p = 2$, which we term the Traveling Firefighter Problem (TFP), when the cost due to a delay in service is quadratic in time. We also study the all-norm-TSP problem [Golovin et al. 08], in which the objective is to find a route that is (approximately) optimal with respect to the minimization of any norm of the visit times, and improve corresponding (in)approximability bounds on metric spaces.