登录    注册    忘记密码

详细信息

基于深度强化学习与动态运动基元的自动驾驶类人轨迹规划    

Human-Like Trajectory Planning for Autonomous Vehicles Based on Deep Reinforcement Learning and Dynamic Movement Primitives

文献类型:期刊文献

中文题名:基于深度强化学习与动态运动基元的自动驾驶类人轨迹规划

英文题名:Human-Like Trajectory Planning for Autonomous Vehicles Based on Deep Reinforcement Learning and Dynamic Movement Primitives

作者:修丽梅[1];刘元盛[1]

机构:[1]北京联合大学机器人学院(人工智能学院),北京100101

第一机构:北京联合大学机器人学院

年份:2025

期号:12

起止页码:10-18

中文期刊名:汽车技术

外文期刊名:Automobile Technology

收录:;北大核心:【北大核心2023】;

基金:国家自然科学基金项目(62371013)。

语种:中文

中文关键词:自动驾驶;轨迹规划;动态运动基元;深度强化学习;安全避障约束

外文关键词:Autonomous driving;Trajectory planning;Dynamic motion primitives;Deep reinforcement learning;Safety-constrained obstacle avoidance

摘要:针对现有强化学习方法直接输出底层控制指令导致车辆控制抖动、轨迹不连贯等问题,提出一种融合深度强化学习(DRL)与动态运动基元(DMP)的分层控制框架。将自动驾驶控制任务划分为上层语义决策与下层轨迹生成两个阶段:上层由强化学习算法基于环境状态输出驾驶意图及DMP控制参数,下层通过对人类示教数据的建模,利用DMP学习驾驶技能的潜在特征表达,并结合障碍物与周围车辆的位置构建动态避障耦合项,生成连续平滑、符合人类驾驶习惯的避碰轨迹。选取高速公路变道作为典型测试场景开展对比仿真验证,结果显示,所提出的方法在策略收敛性、训练稳定性、换道成功率、轨迹平滑性以及转向控制连续性等多个方面具有显著优势。
To address issues such as vehicle control jitter and trajectory discontinuity caused by existing reinforcement learning methods that directly output low-level control commands,a hierarchical control framework integrating Deep Reinforcement Learning(DRL)and Dynamic Movement Primitives(DMP)is proposed.The autonomous driving control task is divided into 2 stages:high-level semantic decision-making and low-level trajectory generation.The high-level DRL module outputs driving intentions and DMP parameters based on real-time environmental observations.At the low level,by modeling human demonstration data,DMP is used to learn latent feature representations of driving skills.Additionally,a dynamic obstacle avoidance coupling term is constructed by incorporating the positions of obstacles and surrounding vehicles,generating continuous,smooth collision-avoidance trajectories that align with human driving habits.To comprehensively evaluate the performance of the proposed framework,highway lane-changing scenarios are used as a testbed.Experimental results show significant advantages over baseline methods in terms of policy convergence,training stability,lane-change success rate,trajectory smoothness,and steering continuity.

参考文献:

正在载入数据...

版权所有©北京联合大学 重庆维普资讯有限公司 渝B2-20050021-8 
渝公网安备 50019002500408号 违法和不良信息举报中心