详细信息
Autonomous Driving Decision Control Based on Improved Proximal Policy Optimization Algorithm ( SCI-EXPANDED收录)
文献类型:期刊文献
英文题名:Autonomous Driving Decision Control Based on Improved Proximal Policy Optimization Algorithm
作者:Song, Qingpeng[1,2];Liu, Yuansheng[2,3];Lu, Ming[4];Zhang, Jun[3];Qi, Han[1,2];Wang, Ziyu[5];Liu, Zijian[2,3]
第一作者:Song, Qingpeng
通讯作者:Liu, YS[1];Liu, YS[2]
机构:[1]Beijing Union Univ, Coll Smart City, Beijing 100101, Peoples R China;[2]Beijing Union Univ, Beijing Key Lab Informat Serv Engn, Beijing 100101, Peoples R China;[3]Beijing Union Univ, Coll Robot, Beijing 100101, Peoples R China;[4]Beijing Union Univ, Coll Appl Sci & Technol, Beijing 100101, Peoples R China;[5]Beijing Union Univ, Coll Urban Rail Transit & Logist, Beijing 100101, Peoples R China
第一机构:北京联合大学
通讯机构:[1]corresponding author), Beijing Union Univ, Beijing Key Lab Informat Serv Engn, Beijing 100101, Peoples R China;[2]corresponding author), Beijing Union Univ, Coll Robot, Beijing 100101, Peoples R China.|[1141739]北京联合大学机器人学院;[11417]北京联合大学;[11417103]北京联合大学北京市信息服务工程重点实验室;
年份:2023
卷号:13
期号:11
外文期刊名:APPLIED SCIENCES-BASEL
收录:;Scopus(收录号:2-s2.0-85161588430);WOS:【SCI-EXPANDED(收录号:WOS:001004694200001)】;
基金:This work is supported in part by the National Key R&D Program under Grand 2021YFC3001300, in part by the National Natural Science Foundation of China Key Project Collaboration under Grand 61931012, in part by the Natural Science Foundation of Beijing under Grand 4222025, in part by the Science and Technique General Program of Beijing Municipal Commission of Education under Grant No. KM202011417001, and in part by the Academic Research Projects of Beijing Union University under Grant ZK10202208 and ZK90202106.
语种:英文
外文关键词:deep learning; reinforcement learning; sparse reward environments; autonomous driving; decision and control
摘要:The decision-making control of autonomous driving in complex urban road environments is a difficult problem in the research of autonomous driving. In order to solve the problem of high dimensional state space and sparse reward in autonomous driving decision control in this environment, this paper proposed a Coordinated Convolution Multi-Reward Proximal Policy Optimization (CCMR-PPO). This method reduces the dimension of the bird's-eye view data through the coordinated convolution network and then fuses the processed data with the vehicle state data as the input of the algorithm to optimize the state space. The control commands acc (acc represents throttle and brake) and steer of the vehicle are used as the output of the algorithm.. Comprehensively considering the lateral error, safety distance, speed, and other factors of the vehicle, a multi-objective reward mechanism was designed to alleviate the sparse reward. Experiments on the CARLA simulation platform show that the proposed method can effectively increase the performance: compared with the PPO algorithm, the line crossed times are reduced by 24 %, and the number of tasks completed is increased by 54 %.
参考文献:
正在载入数据...