登录    注册    忘记密码

详细信息

基于深度强化学习的无人机集群协同作战决策方法    

UAV cluster cooperative combat decision-making method based on deep reinforcement learning

文献类型:期刊文献

中文题名:基于深度强化学习的无人机集群协同作战决策方法

英文题名:UAV cluster cooperative combat decision-making method based on deep reinforcement learning

作者:赵琳[1];吕科[1];郭靖[2];宏晨[3];向贤财[1];薛健[1];王泳[4]

第一作者:赵琳

机构:[1]中国科学院大学工程科学学院,北京100049;[2]沈阳航空航天大学电子信息工程学院,沈阳110136;[3]北京联合大学机器人学院,北京100101;[4]中国科学院大学人工智能学院,北京100049

第一机构:中国科学院大学工程科学学院,北京100049

年份:2023

卷号:43

期号:11

起止页码:3641-3646

中文期刊名:计算机应用

外文期刊名:journal of Computer Applications

收录:CSTPCD;;北大核心:【北大核心2020】;CSCD:【CSCD_E2023_2024】;

基金:国家重点研发计划项目(2018AAA0100804)。

语种:中文

中文关键词:无人机;多集群;公共物品博弈;多智能体深度确定性策略梯度;协同作战决策方法

外文关键词:Unmanned Aerial Vehicle(UAV);multi-cluster;public goods game;Multi-Agent Deep Deterministic Policy Gradient(MADDPG);cooperative combat decision-making method

摘要:在无人机(UAV)集群攻击地面目标时,UAV集群将分为两个编队:主攻目标的打击型UAV集群和牵制敌方的辅助型UAV集群。当辅助型UAV集群选择激进进攻或保存实力这两种动作策略时,任务场景类似于公共物品博弈,此时合作者的收益小于背叛者。基于此,提出一种基于深度强化学习的UAV集群协同作战决策方法。首先,通过建立基于公共物品博弈的UAV集群作战模型,模拟智能化UAV集群在合作中个体与集体间的利益冲突问题;其次,利用多智能体深度确定性策略梯度(MADDPG)算法求解辅助UAV集群最合理的作战决策,从而以最小的损耗代价实现集群胜利。在不同数量UAV情况下进行训练并展开实验,实验结果表明,与IDQN(Independent Deep QNetwork)和ID3QN(Imitative Dueling Double Deep Q-Network)这两种算法的训练效果相比,所提算法的收敛性最好,且在4架辅助型UAV情况下胜率可达100%,在其他UAV数情况下也明显优于对比算法。
When the Unmanned Aerial Vehicle(UAV)cluster attacks ground targets,it will be divided into two formations:a strike UAV cluster that attacks the targets and a auxiliary UAV cluster that pins down the enemy.When auxiliary UAVs choose the action strategy of aggressive attack or saving strength,the mission scenario is similar to a public goods game where the benefits to the cooperator are less than those to the betrayer.Based on this,a decision method for cooperative combat of UAV clusters based on deep reinforcement learning was proposed.First,by building a public goods game based UAV cluster combat model,the interest conflict problem between individual and group in cooperation of intelligent UAV clusters was simulated.Then,Muti-Agent Deep Deterministic Policy Gradient(MADDPG)algorithm was used to solve the most reasonable combat decision of the auxiliary UAV cluster to achieve cluster victory with minimum loss cost.Training and experiments were performed under conditions of different numbers of UAV.The results show that compared to the training effects of two algorithms—IDQN(Independent Deep Q-Network)and ID3QN(Imitative Dueling Double Deep Q-Network),the proposed algorithm has the best convergence,its winning rate can reach 100%with four auxiliary UAVs,and it also significantly outperforms the comparison algorithms with other UAV numbers.

参考文献:

正在载入数据...

版权所有©北京联合大学 重庆维普资讯有限公司 渝B2-20050021-8 
渝公网安备 50019002500408号 违法和不良信息举报中心