详细信息
文献类型:期刊文献
中文题名:引入反事实基线的无人机集群对抗博弈方法
英文题名:UAV swarm adversarial game method with a counterfactual baseline
作者:王尔申[1,2];陈纪浩[1];宏晨[3,4];刘帆[1];陈艾东[3,4];景竑元[3,4]
第一作者:王尔申
机构:[1]沈阳航空航天大学电子信息工程学院,沈阳110136;[2]沈阳航空航天大学民用航空学院,沈阳110136;[3]北京联合大学多智能体系统研究中心,北京100101;[4]北京联合大学机器人学院,北京100101
第一机构:沈阳航空航天大学电子信息工程学院,沈阳110136
年份:2024
卷号:54
期号:7
起止页码:1775-1792
中文期刊名:中国科学:信息科学
外文期刊名:Scientia Sinica(Informationis)
收录:CSTPCD;;Scopus;北大核心:【北大核心2023】;CSCD:【CSCD2023_2024】;
基金:国家重点研发计划(批准号:2018AAA0100804);国家自然科学基金(批准号:62173237);北京联合大学科研(批准号:ZK30202304,SK160202103,ZK50201911,ZK30202107);卫星导航系统与装备技术国家重点实验室开放基金项目(批准号:CEPNT2022A01);辽宁省属本科高校基本科研业务费专项(批准号:20240177,20240215);沈阳市科技计划项目(批准号:22-322-3-34)资助。
语种:中文
中文关键词:无人机集群;对抗博弈;多智能体;深度强化学习;纳什均衡
外文关键词:UAV swarm;confrontation game;multi-agent;deep reinforcement learning;Nash equilibrium
摘要:无人机在协同对抗博弈上的应用越来越广泛和深入,尤其是无人机集群在协同探测、全域对抗、策略骗扰等对抗任务中,发挥着越来越重要作用,可靠高效的无人机集群博弈方法是当前的研究热点.本文将反事实基线思想引入到无人机集群对抗博弈环境,提出一种基于反事实多智能体策略梯度(counterfactual multi-agent policy gradients,COMA)的无人机集群对抗博弈方法;在具有无限连续状态、动作的无人机对抗环境中,基于无人机动力学模型,设置符合实际环境的击敌条件和奖励函数,构建基于多智能体深度强化学习的无人机集群对抗博弈模型.红蓝双方无人机集群采取不同的对抗博弈方法,利用多智能体粒子群环境(multi-agent particle environment,MPE)对红蓝双方无人机集群进行非对称性对抗实验,实验结果表明平均累积奖励能够收敛到纳什均衡,在解决4 vs.8的对抗决策问题方面,COMA方法的平均命中率较DQN和MADDPG分别提升39%和17%,在平均胜率方面比DQN和MADDPG分别提升34%和17%.最后,通过对COMA方法的收敛性和稳定性的深入分析,保证了COMA方法在无人机集群对抗博弈任务上的实用性和鲁棒性.
The collaborative adversarial game of unmanned aerial vehicles(UAVs)is becoming increasingly widespread and profound,especially in collaborative detection,global confrontation,strategic deception and other confrontation tasks.Reliable and efficient UAV swarm game methods are currently a hot research topic.This paper introduces the counterfactual baseline concept into the UAV swarm adversarial environment and proposes a UAV swarm adversarial game method based on counterfactual multi-agent policy gradients(COMA).In the UAV confrontation environment with infinite continuous states and actions,merging the UAV dynamics,we set up realistic attack conditions and reward functions,and construct a UAV swarm adversarial game model based on multi-agent deep reinforcement learning.The red and blue UAVs adopt different adversarial game methods,and asymmetric adversarial experiments are conducted in the multi-agent particle environment(MPE).The experimental results show that the average cumulative rewards can converge to Nash equilibrium.For 4 vs.8 adversarial decision-making scene,the average hit rate of COMA is 39%and 17%higher than that of DQN and MADDPG,while the average win rate is 34%and 17%higher than that of DQN and MADDPG,respectively.Finally,the practicality and robustness for UAV swarm adversarial game tasks are ensured through an in-depth analysis of the convergence and stability of COMA.
参考文献:
正在载入数据...