A Hybrid Reinforcement Learning Approach for Cargo Delivery by Autonomous Drone

Ebru Karaköse; Batuhan Bayraktar

doi:10.62520/fujece.1652790

A Hybrid Reinforcement Learning Approach for Cargo Delivery by Autonomous Drone

Authors : Ebru Karaköse, Batuhan Bayraktar

Pages : 580-603

Doi:10.62520/fujece.1652790

View : 46 | Download : 16

Publication Date : 2025-10-20

Article Type : Research Paper

Abstract :The use of drones, particularly in the transportation sector and cargo delivery, is among the challenging and limited issues that attract significant attention and focus. In this study, a drone operates in a simulation environment created with Unreal Engine software, operating from the center of the map without any external information, not even route information, and delivers cargo completely autonomously. The drone’s missions include overcoming obstacles, remaining unaffected by weather conditions, finding the cargo vehicle, and delivering the cargo to its intended recipient. Three different algorithms, together with RGB and depth cameras, were used for cargo transportation and navigation purposes in an autonomously moving drone. Six different combinations were created, and comparisons were made across a variety of variables. Each combination was trained for 150,000 steps and evaluated against predetermined metrics. The drone was trained using reinforcement learning algorithms such as DQN, PPO, and hybrid Joint-DQN algorithms, and the LSTM algorithm was also used for memory. These algorithms were tested and compared in the simulation environment. Additionally, RGB and depth cameras were integrated into the drone, and each algorithm was run and evaluated separately using the RGB and depth cameras. In the system, the drone earns positive points as it moves toward the target and receives negative points when it moves in the opposite direction. If the drone crashes into an obstacle, the simulation restarts. The results showed that the algorithms first learned to overcome obstacles and then found the correct path. Given sufficient learning time, the drone successfully completed its mission. Furthermore, when the models were evaluated in terms of performance, the DQN-RGB model was identified as the fastest learning model, with the PPO algorithms lagging behind all other models. As a result, it was noted that although the proposed “Joint” layer slows down the learning rate, it produces a more stable and efficient model in the long run.
Keywords : Otonom drone, Kargo teslimatı, Derinlik kamerası, Pekiştirmeli öğrenme, RGB kamera.

ORIGINAL ARTICLE URL