登录    注册    忘记密码

详细信息

基于ACVAE-MPPO算法的端到端自动驾驶算法研究    

Research on End-to-End Autonomous Driving Algorithm Based on ACVAE-MPPO Algorithm

文献类型:期刊文献

中文题名:基于ACVAE-MPPO算法的端到端自动驾驶算法研究

英文题名:Research on End-to-End Autonomous Driving Algorithm Based on ACVAE-MPPO Algorithm

作者:于康鸿[1,2];张军[2];刘元盛[2]

第一作者:于康鸿

机构:[1]北京联合大学北京市信息服务工程重点实验室,北京100101;[2]北京联合大学机器人学院,北京100101

第一机构:北京联合大学北京市信息服务工程重点实验室

年份:2026

卷号:62

期号:4

起止页码:210-223

中文期刊名:计算机工程与应用

外文期刊名:Computer Engineering and Applications

收录:;北大核心:【北大核心2023】;

基金:国家自然科学基金(62371013);北京市属高等学校高水平科研创新团队建设支持计划(BPHR20220121);北京联合大学校级科研项目(ZK20202404)。

语种:中文

中文关键词:变分自编码器;近端策略优化算法;深度强化学习;自动驾驶

外文关键词:variational auto-encoder;proximal policy optimization algorithm;deep reinforcement learning;autonomous driving

摘要:由于道路类型多样、交互实体众多以及环境复杂,在城市环境中实现高效的自动驾驶是当今自动驾驶技术研究的重点和挑战之一。端到端强化学习在自动驾驶应用中,面临表征模型提取特征能力不足和决策模型学习特征间历史联系困难的问题,这些限制影响了算法在复杂城市环境中的决策性能。针对上述问题,提出ACVAE-MPPO算法。为了解决特征提取精度低的问题,在变分自编码器(variational auto-encoder,VAE)中加入坐标卷积层,使用判别器进行辅助训练,形成辅助训练坐标卷积变分自编码器(auxiliary training coordinate convolutional variational auto-encoder,ACVAE),最终提升特征提取的精度;为了增强决策模型提取历史特征的能力,在近端策略优化算法(proximal policy optimization,PPO)中引入长短期记忆网络,形成记忆近端策略优化算法(memory proximal policy optimization,MPPO),使PPO能够记忆和有效利用时序信息,提升决策准确性。将两个模型结合形成ACVAE-MPPO算法。Carla仿真器的实验结果表明,ACVAE-MPPO算法能展现出更强的决策能力,实现更稳定且成功率更高的驾驶决策。
Due to the diversity of road types,the multitude of interacting entities,and the complexity of environments,realizing efficient autonomous driving in urban environments is one of the focuses and challenges of current autonomous driving technology research.End-to-end reinforcement learning in autonomous driving applications faces the issues,such as insufficient feature extraction capabilities of representation models and difficulties in learning historical connections between features in decision models,which affect the decision-making performance of the algorithm in complex urban environments.In response to these issues,a new deep reinforcement learning algorithm ACVAE-MPPO is proposed.In order to solve the problem of low feature extraction accuracy,a coordinate convolutional layer is added to the variational auto-encoder(VAE),and a discriminator is used for auxiliary training to form an auxiliary training coordinate convolutional variational auto-encoder(ACVAE),which ultimately improves the accuracy of feature extraction.To strengthen the ability of the decision model to extract historical information,a long short-term memory(LSTM)network is integrated into the proximal policy optimization(PPO)algorithm to form the memory proximal policy optimization(MPPO)algorithm,which enables the PPO algorithm to effectively remember and utilize temporal information and thus improve decision accuracy.The two models are combined to form the ACVAE-MPPO algorithm.Testing in the Carla simulation environment demonstrates the superior decision-making capabilities of the ACVAE-MPPO algorithm,achieving higher driving success rates and stability.

参考文献:

正在载入数据...

版权所有©北京联合大学 重庆维普资讯有限公司 渝B2-20050021-8 
渝公网安备 50019002500408号 违法和不良信息举报中心