Research on multi-UAV air combat maneuvering decision-making based on MASAC

Abstract

With the rapid advancement of intelligent and clustered unmanned technologies, the autonomous decision-making and confrontation of multiple unmanned aerial vehicles (UAVs) has emerged as a prominent research focus among major military powers worldwide. Multi-UAV confrontation environments are characterized by high-dimensional action spaces, nonlinearity, and stringent real-time decision-making requirements, which pose significant challenges to existing decision-making algorithms. Therefore, this paper addresses the problem of the real-time maneuvering decision-making problem of multiple UAVs in the context of 2v2 close-range air combat. Firstly, a multi-UAV confrontation simulation environment based on the agent-environment cyclic (AEC) game model is developed to resolve issues of ambiguous reward allocation and dynamic variations in the number of intelligent agents. Secondly, a multi-agent soft actor-critic deep reinforcement learning method is proposed within a centralized training-distributed execution (CTDE) framework, supplemented by a strategy training and optimization approach incorporating curriculum learning. Furthermore, by integrating mainline and process rewards, collaborative rewards are introduced to strengthen tactical coordination among UAVs and enhance the effectiveness of adversarial strategies. Finally, three-dimensional simulation experiments validate the effectiveness and stability of the proposed method.

Keywords

multi-UAV multi-agent reinforcement learning AEC game modeling decision-making curriculum learning

Get full access to this article

View all access options for this article.

References

Kim

Lee

Woo

, et al. Cooperative reinforcement learning for military drones over large-scale battlefields. IEEE Trans Intell Veh 2024: 1–11. https://doi.org/10.1109/tiv.2024.3472213

Wang

, et al. Deep reinforcement learning-based air combat maneuver decision-making: literature review, implementation tutorial and future direction. Artif Intell Rev 2024; 57(1): 1. https://doi.org/10.1007/s10462-023-10620-2

Ding

Gao

. UAV air combat maneuvering decision based on intuitionistic fuzzy game theory. Syst Eng Electron 2019; 41(5): 1063–1070.

Liu

Zhang

. The application of situation function in differential game problem of the air combat. In: 2018 Chinese Automation Congress (CAC). Xi'an, China, 2018, pp. 1190–1195. IEEE Press.

Shao

Luo

. Cooperative combat decision-making research for multi-UAVs. Inf Control 2018; 47(3): 347–354.

Xie

Yang

Dai

, et al. Air combat maneuver decision based on reinforcement genetic algorithm. J Northwest Polytech Univ 2020; 38(6): 1330–1338. https://doi.org/10.1051/jnwpu/20203861330

Crumpacker

Robbins

Jenkins

. An approximate dynamic programming approach for solving an air combat maneuvering problem. Expert Syst Appl 2022; 203: 117448. https://doi.org/10.1016/j.eswa.2022.117448

Liu

Zhang

, et al. Multi UAV dynamic maneuver decision-making based on intuitionistic fuzzy counter-game and fractional-order particle swarm optimization. Fractals 2021; 29(8): 2140039. https://doi.org/10.1142/s0218348x21400399

Mulai

Daii

Lei

, et al. UCAV escape maneuvering decision based on fuzzy expert system and IDE algorithm. Syst Eng Electron 2022; 44(6): 1984.

10.

Huang

Dong

Huang

, et al. Autonomous air combat maneuver decision using Bayesian inference and moving horizon optimization. J Syst Eng Electron 2018; 29(1): 86–97. https://doi.org/10.21629/jsee.2018.01.09

11.

Zhang

Wang

Sun

, et al. Air combat maneuver decision based on deep reinforcement learning with auxiliary reward. Neural Comput Appl 2024; 36(21): 13341–13356. https://doi.org/10.1007/s00521-024-09720-z

12.

Cao

Kou

, et al. Autonomous maneuver decision of UCAV air combat based on double deep Q network algorithm and stochastic game theory. International Journal of Aerospace Engineering 2023; 2023(1): 3657814–3657820. https://doi.org/10.1155/2023/3657814

13.

Wang

, et al. Improving maneuver strategy in air combat by alternate freeze games with a deep reinforcement learning algorithm. Math Probl Eng 2020; 2020(1): 7180639. https://doi.org/10.1155/2020/7180639

14.

Zhou

Huang

Zhang

, et al. Research on UAV intelligent air combat decision and simulation based on deep reinforcement learning. Acta Aeronautica Astronautica Sinica 2023; 44(4): 99–112.

15.

Tian

Chen

, et al. Perception error-resistant air combat maneuvering decisions based on deep reinforcement learning. Advanced Engineering Sciences 2024; 56(6): 270–282.

16.

Zhang

, et al. A context-aware feature fusion method for Multi-UAV cooperative air combat. IEEE Trans Intell Transport Syst 2025; 26(5): 7197–7210. https://doi.org/10.1109/tits.2025.3530463

17.

Lowe

Tamar

, et al. Multi agent actor-critic for mixed cooperative-competitive environments. Adv Neural Inf Process Syst 2017; 30: 6382–6393.

18.

Samvelyan

Rashid

, et al. The StarCraft multi-agent challenge. In: Proceedings of the 18th International Conference on Autonomous Agents and Multiagent Systems, Montreal, Canada, 2019, pp. 2186–2188. AAMAS.

19.

Kurach

Raichuk

Stańczyk

, et al. Google research football: a novel reinforcement learning environment. In: Proceedings of the 34th AAAI Conference on Artificial Intelligence, New York, USA, 2020, pp. 4501–4510. AAAI.

20.

Sunehag

Lever

Gruslys

, et al. Value-decomposition networks for cooperative multi-agent learning based on team reward In: Proceedings of the 17th international conference on autonomous agents and multi-agent systems, Stockholm, Sweden, 2018, pp. 2085–2087.

21.

Rashid

Samvelyan

, et al. Monotonic value function factorisation for deep multi-agent reinforcement learning. J Mach Learn Res 2020; 21(1): 7234–7284.

22.

Foerster

Farquhar

Afouras

, et al. Counterfactual multi-agent policy gradients. In: Proceedings of the 32nd AAAI conference on artificial intelligence, New Orleans, USA, 2018, pp. 2974–2982.

23.

Sukhbaatar

Szlam

Fergus

. Learning multiagent communication with backpropagation. In: Proceedings of the 30th conference on neural information processing systems, Barcelona, Spain, 2016, pp. 2252–2260.

24.

Peng

Rashid

, et al. Facmac: factored multi-agent centralised policy gradients. In: Proceedings of the 34th conference on neural information processing systems, Vancouver, Canada, 2020, pp. 12208–12221. Vancouver.

25.

Wang

Qiao

. Multi-UAV confrontation strategy based on ASDDPG algorithm. Syst Eng Electron 2025; 47(06): 1867–1879.

26.

Fang

Wang

, et al. Intelligent maneuvering decision-making in two-UCAV cooperative air combat based on improved MADDPG with hybrid hyper network. Acta Aeronautica Astronautica Sinica 2024; 45(17): 221–235.

27.

Zhu

, et al. Maneuvering decision-making of multi-UAV attack-defence confrontation based on PER-MATD3. Acta Aeronautica Astronautica Sinica 2023; 44(7): 196–209.

28.

Zheng

Xin

, et al. Mean policy-based proximal policy optimization for maneuvering decision in multi-UAV air combat. Neural Comput Appl 2024; 36(31): 19667–19690. https://doi.org/10.1007/s00521-024-10261-8

29.

Zhang

Xiao

, et al. Research on heterogeneous multi-UAV collaborative decision-making method based on improved PPO. Appl Intell 2024; 54(20): 9892–9905. https://doi.org/10.1007/s10489-024-05674-w

30.

Chen

. The research on intelligent cooperative combat of UAV cluster with multi-agent reinforcement learning. Aerospace Systems 2022; 5(1): 107–121. https://doi.org/10.1007/s42401-021-00105-x

31.

Liu

Zhang

. Partially observable multi-agent RL with (quasi-) efficiency: the blessing of information sharing. In: International conference on machine learning, Honolulu, USA, 2023, pp. 22370–22419. PMLR.

32.

Fang

. Aircraft flight dynamics. Beijing University of Aeronautics and Astronautics Press, 2003.

33.

Haarnoja

Zhou

, et al. Soft actor-critic: off-Policy maximum entropy deep reinforcement learning with a stochastic actor. International conference on machine learning. Stockholm, Sweden: Pmlr, 2018, pp. 1861–1870.

34.

Yang

Wang

Han

, et al. An air combat maneuver decision-making approach using coupled reward in deep reinforcement learning. Complex Intell Syst 2025; 11(8): 1–17. https://doi.org/10.1007/s40747-025-01992-9

35.

Fan

Kang

, et al. Air combat maneuver decision method based on A3C deep reinforcement learning. Machines 2022; 10(11): 1033. https://doi.org/10.3390/machines10111033