No Arabic abstract
Asset management attempts to keep the power system in working conditions. It requires much coordination between multiple entities and long term planning often months in advance. In this work we introduce a mid-term asset management formulation as a stochastic optimization problem, that includes three hierarchical layers of decision making, namely the mid-term, short-term and real-time. We devise a tractable scenario approximation technique for efficiently assessing the complex implications a maintenance schedule inflicts on a power system. This is done using efficient Monte-Carlo simulations that trade-off between accuracy and tractability. We then present our implementation of a distributed scenario-based optimization algorithm for solving our formulation, and use an updated PJM 5-bus system to show a solution that is cheaper than other maintenance heuristics that are likely to be considered by TSOs.
Developing a software-intensive product or service can be a significant undertaking, associated with unique challenges in each project stage, from inception to development, delivery, maintenance, and evolution. Each step results in artefacts that are crucial for the project outcome, such as source-code and supporting deliverables, e.g., documentation. Artefacts which have inherent value for the organisation are assets, and as assets, they are subject to degradation. This degradation occurs over time, as artefacts age, and can be more immediate or slowly over a period of time, similar to the concept of technical debt. One challenge with the concept of assets is that it seems not to be well-understood and generally delimited to a few types of assets (often code-based), overlooking other equally important assets. To bridge this gap, we have performed a study to formulate a structured taxonomy of assets. We use empirical data collected through industrial workshops and a literature review to ground the taxonomy. The taxonomy serves as foundations for concepts like asset degradation and asset management. The taxonomy can help contextualise, homogenise, extend the concept of technical debt, and serves as a conceptual framework for better identification, discussion, and utilisation of assets.
As a typical vehicle-cyber-physical-system (V-CPS), connected automated vehicles attracted more and more attention in recent years. This paper focuses on discussing the decision-making (DM) strategy for autonomous vehicles in a connected environment. First, the highway DM problem is formulated, wherein the vehicles can exchange information via wireless networking. Then, two classical reinforcement learning (RL) algorithms, Q-learning and Dyna, are leveraged to derive the DM strategies in a predefined driving scenario. Finally, the control performance of the derived DM policies in safety and efficiency is analyzed. Furthermore, the inherent differences of the RL algorithms are embodied and discussed in DM strategies.
Tree-form sequential decision making (TFSDM) extends classical one-shot decision making by modeling tree-form interactions between an agent and a potentially adversarial environment. It captures the online decision-making problems that each player faces in an extensive-form game, as well as Markov decision processes and partially-observable Markov decision processes where the agent conditions on observed history. Over the past decade, there has been considerable effort into designing online optimization methods for TFSDM. Virtually all of that work has been in the full-feedback setting, where the agent has access to counterfactuals, that is, information on what would have happened had the agent chosen a different action at any decision node. Little is known about the bandit setting, where that assumption is reversed (no counterfactual information is available), despite this latter setting being well understood for almost 20 years in one-shot decision making. In this paper, we give the first algorithm for the bandit linear optimization problem for TFSDM that offers both (i) linear-time iterations (in the size of the decision tree) and (ii) $O(sqrt{T})$ cumulative regret in expectation compared to any fixed strategy, at all times $T$. This is made possible by new results that we derive, which may have independent uses as well: 1) geometry of the dilated entropy regularizer, 2) autocorrelation matrix of the natural sampling scheme for sequence-form strategies, 3) construction of an unbiased estimator for linear losses for sequence-form strategies, and 4) a refined regret analysis for mirror descent when using the dilated entropy regularizer.
Autonomous parking technology is a key concept within autonomous driving research. This paper will propose an imaginative autonomous parking algorithm to solve issues concerned with parking. The proposed algorithm consists of three parts: an imaginative model for anticipating results before parking, an improved rapid-exploring random tree (RRT) for planning a feasible trajectory from a given start point to a parking lot, and a path smoothing module for optimizing the efficiency of parking tasks. Our algorithm is based on a real kinematic vehicle model; which makes it more suitable for algorithm application on real autonomous cars. Furthermore, due to the introduction of the imagination mechanism, the processing speed of our algorithm is ten times faster than that of traditional methods, permitting the realization of real-time planning simultaneously. In order to evaluate the algorithms effectiveness, we have compared our algorithm with traditional RRT, within three different parking scenarios. Ultimately, results show that our algorithm is more stable than traditional RRT and performs better in terms of efficiency and quality.
Value-based methods for reinforcement learning lack generally applicable ways to derive behavior from a value function. Many approaches involve approximate value iteration (e.g., $Q$-learning), and acting greedily with respect to the estimates with an arbitrary degree of entropy to ensure that the state-space is sufficiently explored. Behavior based on explicit greedification assumes that the values reflect those of textit{some} policy, over which the greedy policy will be an improvement. However, value-iteration can produce value functions that do not correspond to textit{any} policy. This is especially relevant in the function-approximation regime, when the true value function cant be perfectly represented. In this work, we explore the use of textit{inverse policy evaluation}, the process of solving for a likely policy given a value function, for deriving behavior from a value function. We provide theoretical and empirical results to show that inverse policy evaluation, combined with an approximate value iteration algorithm, is a feasible method for value-based control.