登录    注册    忘记密码

详细信息

Feasible Reinforcement Learning for Safety Ensurance with Stabilized Training  ( EI收录)  

文献类型:期刊文献

英文题名:Feasible Reinforcement Learning for Safety Ensurance with Stabilized Training

作者:Wang, Zuoshuai[1];Lyu, Yao[2];Zhang, Congsheng[1];Xin, Zhe[1];Gao, Jiaxin[2];Jiang, Beiyan[2,3];Li, Shengbo Eben[2]

第一作者:Wang, Zuoshuai

通讯作者:Xin, Z[1];Li, SE[2]

机构:[1]China Agr Univ, Coll Engn, Beijing 100083, Peoples R China;[2]Tsinghua Univ, Sch Vehicle & Mobil, Beijing 100084, Peoples R China;[3]Beijing Union Univ, Coll Robot, Beijing 100027, Peoples R China

第一机构:China Agr Univ, Coll Engn, Beijing 100083, Peoples R China

通讯机构:[1]corresponding author), China Agr Univ, Coll Engn, Beijing 100083, Peoples R China;[2]corresponding author), Tsinghua Univ, Sch Vehicle & Mobil, Beijing 100084, Peoples R China.

年份:2026

外文期刊名:AUTOMOTIVE INNOVATION

收录:EI(收录号:20260319910132);WOS:【ESCI(收录号:WOS:001658165200001)】;

基金:This work is supported by NSF China under 52221005, and Tsinghua University-Didi Joint Research Center for Future Mobility.

语种:英文

外文关键词:Autonomous driving; Constrained reinforcement learning; Safety guarantee; Training stability; Trust region

摘要:Current constrained reinforcement learning (RL) algorithms face challenges such as slow policy learning, unstable training, and heavy hyperparameter tuning due to their nonconvex optimization nature, often resulting in suboptimal convergence or divergence. To address these issues, this paper proposes the feasible optimization with monotonic improvement (FOMI) algorithm, which guarantees constraint satisfaction and monotonic policy improvement. First, the primal problem is simplified using a Taylor expansion and reconstructed with a trust region constraint, reducing complexity and improving optimization characteristics. Then, a feasible optimization framework is established for the reconstructed problem, which is decomposed into performance improvement and feasibility recovery subproblems to obtain a policy that improves performance while satisfying constraints. An analytic solution for the reconstructed problem is derived, eliminating backpropagation during network training and accelerating policy learning. Building upon this framework, FOMI incorporates neural networks as function carriers for continuous control tasks. Simulations validate that FOMI exhibits excellent training stability and a two-fold increase in learning speed compared to baselines. Real-world experiments verify the effectiveness of FOMI, highlighting its potential for tackling complex real-world tasks.

参考文献:

正在载入数据...

版权所有©北京联合大学 重庆维普资讯有限公司 渝B2-20050021-8 
渝公网安备 50019002500408号 违法和不良信息举报中心