ترغب بنشر مسار تعليمي؟ اضغط هنا

Optimized Trajectory Design in UAV Based Cellular Networks for 3D Users: A Double Q-Learning Approach

90   0   0.0 ( 0 )
 نشر من قبل Xuanlin Liu
 تاريخ النشر 2019
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

In this paper, the problem of trajectory design of unmanned aerial vehicles (UAVs) for maximizing the number of satisfied users is studied in a UAV based cellular network where the UAV works as a flying base station that serves users, and the user indicates its satisfaction in terms of completion of its data request within an allowable maximum waiting time. The trajectory design is formulated as an optimization problem whose goal is to maximize the number of satisfied users. To solve this problem, a machine learning framework based on double Q-learning algorithm is proposed. The algorithm enables the UAV to find the optimal trajectory that maximizes the number of satisfied users. Compared to the traditional learning algorithms, such as Q-learning that selects and evaluates the action using the same Q-table, the proposed algorithm can decouple the selection from the evaluation, therefore avoid overestimation which leads to sub-optimal policies. Simulation results show that the proposed algorithm can achieve up to 19.4% and 14.1% gains in terms of the number of satisfied users compared to random algorithm and Q-learning algorithm.

قيم البحث

اقرأ أيضاً

189 - Yang Wang , Zhen Gao , Jun Zhang 2021
In this paper, we investigate an unmanned aerial vehicle (UAV)-assisted Internet-of-Things (IoT) system in a sophisticated three-dimensional (3D) environment, where the UAVs trajectory is optimized to efficiently collect data from multiple IoT ground nodes. Unlike existing approaches focusing only on a simplified two-dimensional scenario and the availability of perfect channel state information (CSI), this paper considers a practical 3D urban environment with imperfect CSI, where the UAVs trajectory is designed to minimize data collection completion time subject to practical throughput and flight movement constraints. Specifically, inspired from the state-of-the-art deep reinforcement learning approaches, we leverage the twin-delayed deep deterministic policy gradient (TD3) to design the UAVs trajectory and present a TD3-based trajectory design for completion time minimization (TD3-TDCTM) algorithm. In particular, we set an additional information, i.e., the merged pheromone, to represent the state information of UAV and environment as a reference of reward which facilitates the algorithm design. By taking the service statuses of IoT nodes, the UAVs position, and the merged pheromone as input, the proposed algorithm can continuously and adaptively learn how to adjust the UAVs movement strategy. By interacting with the external environment in the corresponding Markov decision process, the proposed algorithm can achieve a near-optimal navigation strategy. Our simulation results show the superiority of the proposed TD3-TDCTM algorithm over three conventional non-learning based baseline methods.
In this paper, an unmanned aerial vehicle (UAV)-assisted wireless network is considered in which a battery-constrained UAV is assumed to move towards energy-constrained ground nodes to receive status updates about their observed processes. The UAVs f light trajectory and scheduling of status updates are jointly optimized with the objective of minimizing the normalized weighted sum of Age of Information (NWAoI) values for different physical processes at the UAV. The problem is first formulated as a mixed-integer program. Then, for a given scheduling policy, a convex optimization-based solution is proposed to derive the UAVs optimal flight trajectory and time instants on updates. However, finding the optimal scheduling policy is challenging due to the combinatorial nature of the formulated problem. Therefore, to complement the proposed convex optimization-based solution, a finite-horizon Markov decision process (MDP) is used to find the optimal scheduling policy. Since the state space of the MDP is extremely large, a novel neural combinatorial-based deep reinforcement learning (NCRL) algorithm using deep Q-network (DQN) is proposed to obtain the optimal policy. However, for large-scale scenarios with numerous nodes, the DQN architecture cannot efficiently learn the optimal scheduling policy anymore. Motivated by this, a long short-term memory (LSTM)-based autoencoder is proposed to map the state space to a fixed-size vector representation in such large-scale scenarios. A lower bound on the minimum NWAoI is analytically derived which provides system design guidelines on the appropriate choice of importance weights for different nodes. The numerical results also demonstrate that the proposed NCRL approach can significantly improve the achievable NWAoI per process compared to the baseline policies, such as weight-based and discretized state DQN policies.
304 - Shuowen Zhang , Rui Zhang 2019
In this paper, we study the trajectory design for a cellular-connected unmanned aerial vehicle (UAV) with given initial and final locations, while communicating with the ground base stations (GBSs) along its flight. We consider delay-limited communic ations between the UAV and its associated GBSs, where a given signal-to-noise ratio (SNR) target needs to be satisfied at the receiver. However, in practice, due to various factors such as quality-of-service (QoS) requirement, GBSs availability and UAV mobility constraints, the SNR target may not be met at certain time periods during the flight, each termed as an outage duration. In this paper, we aim to optimize the UAV trajectory to minimize its mission completion time, subject to a constraint on the maximum tolerable outage duration in its flight. To tackle this non-convex problem, we first transform it into a more tractable form and thereby reveal some useful properties of the optimal trajectory solution. Based on these properties, we then further simplify the problem and propose efficient algorithms to check the feasibility of the problem as well as to obtain its optimal and high-quality suboptimal solutions, by leveraging graph theory and convex optimization techniques. Numerical results show that our proposed trajectory designs outperform the conventional method based on dynamic programming, in terms of both performance and complexity.
In this paper, we study a cellular-enabled unmanned aerial vehicle (UAV) communication system consisting of one UAV and multiple ground base stations (GBSs). The UAV has a mission of flying from an initial location to a final location, during which i t needs to maintain reliable wireless connection with the cellular network by associating with one of the GBSs at each time instant. We aim to minimize the UAV mission completion time by optimizing its trajectory, subject to a quality of connectivity constraint of the GBS-UAV link specified by a minimum received signal-to-noise ratio (SNR) target, which needs to be satisfied throughout the mission. This problem is non-convex and difficult to be optimally solved. We first propose an effective approach to check its feasibility based on graph connectivity verification. Then, by examining the GBS-UAV association sequence during the UAV mission, we obtain useful insights on the optimal UAV trajectory, based on which an efficient algorithm is proposed to find an approximate solution to the trajectory optimization problem by leveraging techniques in convex optimization and graph theory. Numerical results show that our proposed trajectory design achieves near-optimal performance.
109 - Yao Tang , Man Hon Cheung , 2019
Unmanned aerial vehicles (UAVs) can enhance the performance of cellular networks, due to their high mobility and efficient deployment. In this paper, we present a first study on how the user mobility affects the UAVs trajectories of a multiple-UAV as sisted wireless communication system. Specifically, we consider the UAVs are deployed as aerial base stations to serve ground users who move between different regions. We maximize the throughput of ground users in the downlink communication by optimizing the UAVs trajectories, while taking into account the impact of the user mobility, propulsion energy consumption, and UAVs mutual interference. We formulate the problem as a route selection problem in an acyclic directed graph. Each vertex represents a task associated with a reward on the average user throughput in a region-time point, while each edge is associated with a cost on the energy propulsion consumption during flying and hovering. For the centralized trajectory design, we first propose the shortest path scheme that determines the optimal trajectory for the single UAV case. We also propose the centralized route selection (CRS) scheme to systematically compute the optimal trajectories for the more general multiple-UAV case. Due to the NP-hardness of the centralized problem, we consider the distributed trajectory design that each UAV selects its trajectory autonomously and propose the distributed route selection (DRS) scheme, which will converge to a pure strategy Nash equilibrium within a finite number of iterations.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا