详细信息
Research on Behavioral Decision at an Unsignalized Roundabout for Automatic Driving Based on Proximal Policy Optimization Algorithm ( SCI-EXPANDED收录)
文献类型:期刊文献
英文题名:Research on Behavioral Decision at an Unsignalized Roundabout for Automatic Driving Based on Proximal Policy Optimization Algorithm
作者:Gan, Jingpeng[1];Zhang, Jiancheng[2];Liu, Yuansheng[2]
第一作者:Gan, Jingpeng
通讯作者:Zhang, JC[1]
机构:[1]Beijing Union Univ, Coll Urban Rall Transit & Logist, Beijing 100101, Peoples R China;[2]Beijing Union Univ, Coll Robot, Beijing 100101, Peoples R China
第一机构:北京联合大学
通讯机构:[1]corresponding author), Beijing Union Univ, Coll Robot, Beijing 100101, Peoples R China.|[1141739]北京联合大学机器人学院;[11417]北京联合大学;
年份:2024
卷号:14
期号:7
外文期刊名:APPLIED SCIENCES-BASEL
收录:;Scopus(收录号:2-s2.0-85192441224);WOS:【SCI-EXPANDED(收录号:WOS:001200832100001)】;
基金:No Statement Available
语种:英文
外文关键词:autonomous vehicle; deep reinforcement learning; optimized PPO algorithm; unsignalized roundabout; gap acceptance theory
摘要:Unsignalized roundabouts have a significant impact on traffic flow and vehicle safety. To address the challenge of autonomous vehicles passing through roundabouts with low penetration, improve their efficiency, and ensure safety and stability, we propose the proximal policy optimization (PPO) algorithm to enhance decision-making behavior. We develop an optimization-based behavioral choice model for autonomous vehicles that incorporates gap acceptance theory and deep reinforcement learning using the PPO algorithm. Additionally, we employ the CoordConv network to establish an aerial view for spatial perception information gathering. Furthermore, a dynamic multi-objective reward mechanism is introduced to maximize the PPO algorithm's reward pool function while quantifying environmental factors. Through simulation experiments, we demonstrate that our optimized PPO algorithm significantly improves training efficiency by enhancing the reward value function by 2.85%, 7.17%, and 19.58% in scenarios with 20, 100, and 200 social vehicles, respectively, compared to the PPO+CCMR algorithm. The effectiveness of simulation training also increases by 11.1%, 13.8%, and 7.4%. Moreover, there is a reduction in crossing time by 2.37%, 2.62%, and 13.96%. Our optimized PPO algorithm enhances path selection during autonomous vehicle simulation training as they tend to drive in the inner ring over time; however, the influence of social vehicles on path selection diminishes as their quantity increases. The safety of autonomous vehicles remains largely unaffected by our optimized PPO algorithm.
参考文献:
正在载入数据...