详细信息
Feasible Reinforcement Learning for Safety Ensurance with Stabilized Training ( EI收录)
文献类型:期刊文献
英文题名:Feasible Reinforcement Learning for Safety Ensurance with Stabilized Training
作者:Wang, Zuoshuai[1];Lyu, Yao[2];Zhang, Congsheng[1];Xin, Zhe[1];Gao, Jiaxin[2];Jiang, Beiyan[2,3];Li, Shengbo Eben[2]
第一作者:Wang, Zuoshuai
通讯作者:Xin, Z[1];Li, SE[2]
机构:[1]China Agr Univ, Coll Engn, Beijing 100083, Peoples R China;[2]Tsinghua Univ, Sch Vehicle & Mobil, Beijing 100084, Peoples R China;[3]Beijing Union Univ, Coll Robot, Beijing 100027, Peoples R China
第一机构:China Agr Univ, Coll Engn, Beijing 100083, Peoples R China
通讯机构:[1]corresponding author), China Agr Univ, Coll Engn, Beijing 100083, Peoples R China;[2]corresponding author), Tsinghua Univ, Sch Vehicle & Mobil, Beijing 100084, Peoples R China.
年份:2026
外文期刊名:AUTOMOTIVE INNOVATION
收录:EI(收录号:20260319910132);WOS:【ESCI(收录号:WOS:001658165200001)】;
基金:This work is supported by NSF China under 52221005, and Tsinghua University-Didi Joint Research Center for Future Mobility.
语种:英文
外文关键词:Autonomous driving; Constrained reinforcement learning; Safety guarantee; Training stability; Trust region
摘要:Current constrained reinforcement learning (RL) algorithms face challenges such as slow policy learning, unstable training, and heavy hyperparameter tuning due to their nonconvex optimization nature, often resulting in suboptimal convergence or divergence. To address these issues, this paper proposes the feasible optimization with monotonic improvement (FOMI) algorithm, which guarantees constraint satisfaction and monotonic policy improvement. First, the primal problem is simplified using a Taylor expansion and reconstructed with a trust region constraint, reducing complexity and improving optimization characteristics. Then, a feasible optimization framework is established for the reconstructed problem, which is decomposed into performance improvement and feasibility recovery subproblems to obtain a policy that improves performance while satisfying constraints. An analytic solution for the reconstructed problem is derived, eliminating backpropagation during network training and accelerating policy learning. Building upon this framework, FOMI incorporates neural networks as function carriers for continuous control tasks. Simulations validate that FOMI exhibits excellent training stability and a two-fold increase in learning speed compared to baselines. Real-world experiments verify the effectiveness of FOMI, highlighting its potential for tackling complex real-world tasks.
参考文献:
正在载入数据...
