Safety Enhanced Designs in UAS Risk Monitoring and Collision Resolution

Document
Description
Collision-free path planning is also a major challenge in managing unmanned aerial vehicles (UAVs) fleets, especially in uncertain environments. The design of UAV routing policies using multi-agent reinforcement learning has been considered, and propose a Multi-resolution, Multi-agent, Mean-field reinforcement learning

Collision-free path planning is also a major challenge in managing unmanned aerial vehicles (UAVs) fleets, especially in uncertain environments. The design of UAV routing policies using multi-agent reinforcement learning has been considered, and propose a Multi-resolution, Multi-agent, Mean-field reinforcement learning algorithm, named 3M-RL, for flight planning, where multiple vehicles need to avoid collisions with each other while moving towards their destinations. In this system, each UAV makes decisions based on local observations, and does not communicate with other UAVs. The algorithm trains a routing policy using an Actor-Critic neural network with multi-resolution observations, including detailed local information and aggregated global information based on mean-field. The algorithm tackles the curse-of-dimensionality problem in multi-agent reinforcement learning and provides a scalable solution. The proposed algorithm is tested in different complex scenarios in both 2D and 3D space and the simulation results show that 3M-RL result in good routing policies. Also as a compliment, dynamic data communications between UAVs and a control center has also been studied, where the control center needs to monitor the safety state of each UAV in the system in real time, where the transition of risk level is simply considered as a Markov process. Given limited communication bandwidth, it is impossible for the control center to communicate with all UAVs at the same time. A dynamic learning problem with limited communication bandwidth is also discussed in this paper where the objective is to minimize the total information entropy in real-time risk level tracking. The simulations also demonstrate that the algorithm outperforms policies such as a Round & Robin policy.