ﻻ يوجد ملخص باللغة العربية
In this paper, the real-time deployment of unmanned aerial vehicles (UAVs) as flying base stations (BSs) for optimizing the throughput of mobile users is investigated for UAV networks. This problem is formulated as a time-varying mixed-integer non-convex programming (MINP) problem, which is challenging to find an optimal solution in a short time with conventional optimization techniques. Hence, we propose an actor-critic-based (AC-based) deep reinforcement learning (DRL) method to find near-optimal UAV positions at every moment. In the proposed method, the process searching for the solution iteratively at a particular moment is modeled as a Markov decision process (MDP). To handle infinite state and action spaces and improve the robustness of the decision process, two powerful neural networks (NNs) are configured to evaluate the UAV position adjustments and make decisions, respectively. Compared with the heuristic algorithm, sequential least-squares programming and fixed UAVs methods, simulation results have shown that the proposed method outperforms these three benchmarks in terms of the throughput at every moment in UAV networks.
We explore deep reinforcement learning methods for multi-agent domains. We begin by analyzing the difficulty of traditional algorithms in the multi-agent case: Q-learning is challenged by an inherent non-stationarity of the environment, while policy
We present an approach to training neural networks to generate sequences using actor-critic methods from reinforcement learning (RL). Current log-likelihood training methods are limited by the discrepancy between their training and testing modes, as
Optimization of parameterized policies for reinforcement learning (RL) is an important and challenging problem in artificial intelligence. Among the most common approaches are algorithms based on gradient ascent of a score function representing disco
On error of value function inevitably causes an overestimation phenomenon and has a negative impact on the convergence of the algorithms. To mitigate the negative effects of the approximation error, we propose Error Controlled Actor-critic which ensu
The exploration mechanism used by a Deep Reinforcement Learning (RL) agent plays a key role in determining its sample efficiency. Thus, improving over random exploration is crucial to solve long-horizon tasks with sparse rewards. We propose to levera