No Arabic abstract
This work considers the problem of control and resource scheduling in networked systems. We present DIRA, a Deep reinforcement learning based Iterative Resource Allocation algorithm, which is scalable and control-aware. Our algorithm is tailored towards large-scale problems where control and scheduling need to act jointly to optimize performance. DIRA can be used to schedule general time-domain optimization based controllers. In the present work, we focus on control designs based on suitably adapted linear quadratic regulators. We apply our algorithm to networked systems with correlated fading communication channels. Our simulations show that DIRA scales well to large scheduling problems.
We consider networked control systems consisting of multiple independent controlled subsystems, operating over a shared communication network. Such systems are ubiquitous in cyber-physical systems, Internet of Things, and large-scale industrial systems. In many large-scale settings, the size of the communication network is smaller than the size of the system. In consequence, scheduling issues arise. The main contribution of this paper is to develop a deep reinforcement learning-based emph{control-aware} scheduling (textsc{DeepCAS}) algorithm to tackle these issues. We use the following (optimal) design strategy: First, we synthesize an optimal controller for each subsystem; next, we design a learning algorithm that adapts to the chosen subsystems (plants) and controllers. As a consequence of this adaptation, our algorithm finds a schedule that minimizes the emph{control loss}. We present empirical results to show that textsc{DeepCAS} finds schedules with better performance than periodic ones.
In many Cyber-Physical Systems, we encounter the problem of remote state estimation of geographically distributed and remote physical processes. This paper studies the scheduling of sensor transmissions to estimate the states of multiple remote, dynamic processes. Information from the different sensors have to be transmitted to a central gateway over a wireless network for monitoring purposes, where typically fewer wireless channels are available than there are processes to be monitored. For effective estimation at the gateway, the sensors need to be scheduled appropriately, i.e., at each time instant one needs to decide which sensors have network access and which ones do not. To address this scheduling problem, we formulate an associated Markov decision process (MDP). This MDP is then solved using a Deep Q-Network, a recent deep reinforcement learning algorithm that is at once scalable and model-free. We compare our scheduling algorithm to popular scheduling algorithms such as round-robin and reduced-waiting-time, among others. Our algorithm is shown to significantly outperform these algorithms for many example scenarios.
Multicasting in wireless systems is a natural way to exploit the redundancy in user requests in a Content Centric Network. Power control and optimal scheduling can significantly improve the wireless multicast networks performance under fading. However, the model based approaches for power control and scheduling studied earlier are not scalable to large state space or changing system dynamics. In this paper, we use deep reinforcement learning where we use function approximation of the Q-function via a deep neural network to obtain a power control policy that matches the optimal policy for a small network. We show that power control policy can be learnt for reasonably large systems via this approach. Further we use multi-timescale stochastic optimization to maintain the average power constraint. We demonstrate that a slight modification of the learning algorithm allows tracking of time varying system statistics. Finally, we extend the multi-timescale approach to simultaneously learn the optimal queueing strategy along with power control. We demonstrate scalability, tracking and cross layer optimization capabilities of our algorithms via simulations. The proposed multi-timescale approach can be used in general large state space dynamical systems with multiple objectives and constraints, and may be of independent interest.
Passivity-based control (PBC) for port-Hamiltonian systems provides an intuitive way of achieving stabilization by rendering a system passive with respect to a desired storage function. However, in most instances the control law is obtained without any performance considerations and it has to be calculated by solving a complex partial differential equation (PDE). In order to address these issues we introduce a reinforcement learning approach into the energy-balancing passivity-based control (EB-PBC) method, which is a form of PBC in which the closed-loop energy is equal to the difference between the stored and supplied energies. We propose a technique to parameterize EB-PBC that preserves the systemss PDE matching conditions, does not require the specification of a global desired Hamiltonian, includes performance criteria, and is robust to extra non-linearities such as control input saturation. The parameters of the control law are found using actor-critic reinforcement learning, enabling learning near-optimal control policies satisfying a desired closed-loop energy landscape. The advantages are that near-optimal controllers can be generated using standard energy shaping techniques and that the solutions learned can be interpreted in terms of energy shaping and damping injection, which makes it possible to numerically assess stability using passivity theory. From the reinforcement learning perspective, our proposal allows for the class of port-Hamiltonian systems to be incorporated in the actor-critic framework, speeding up the learning thanks to the resulting parameterization of the policy. The method has been successfully applied to the pendulum swing-up problem in simulations and real-life experiments.
We consider a multicast scheme recently proposed for a wireless downlink in [1]. It was shown earlier that power control can significantly improve its performance. However for this system, obtaining optimal power control is intractable because of a very large state space. Therefore in this paper we use deep reinforcement learning where we use function approximation of the Q-function via a deep neural network. We show that optimal power control can be learnt for reasonably large systems via this approach. The average power constraint is ensured via a Lagrange multiplier, which is also learnt. Finally, we demonstrate that a slight modification of the learning algorithm allows the optimal control to track the time varying system statistics.