详细信息
CVaR-based Thompson Sampling for Factored Multi-agent Multi-Armed Risk-aware Bandit ( EI收录)
文献类型:期刊文献
英文题名:CVaR-based Thompson Sampling for Factored Multi-agent Multi-Armed Risk-aware Bandit
作者:Li, Jiahong[1]; Tian, Huiyu[2]
第一作者:李佳洪
机构:[1] College of Robotics, Beijing Union University, Beijing, China; [2] Department of Science and Technology, Beijing Normal University - Hong Kong Baptist University United International College, Zhuhai, China
第一机构:北京联合大学机器人学院
年份:2024
起止页码:7167-7172
外文期刊名:Proceedings - 2024 China Automation Congress, CAC 2024
收录:EI(收录号:20251118057524)
语种:英文
外文关键词:Adaptive boosting - Gaussian distribution - Polynomial approximation - Risk assessment - Risk management - Risk perception
摘要:Risk-aware online learning and bandit algorithms have been successfully applied in scenarios where optimizing the expected reward is not sufficient to achieve desired outcomes due to the presence of uncertainty. However, distributed decision-making under uncertainty in multi-agent systems has become increasingly important for addressing real-world tasks. Our research focuses on developing a distributed risk-aware decision-making approach that optimizes a shared objective in a network of loosely coupled agents. Specifically, we address the problem of regret minimization in the multi-agent multi-armed risk-aware bandit (MAMARAB) framework, where each arm follows a Gaussian distribution with bounded variances and is evaluated using Conditional Value at Risk (CVaR). To tackle this problem, we propose a novel CVaR-based multi-agent Thompson sampling (CVaR-MATS) algorithm, which selects actions based on the CVaR measure to balance exploration and exploitation while taking into account the risk associated with each arm. Moreover, to provide theoretical guarantees, we derive a regret bound for the CVaR-MATS algorithm that is sub-linear in time and exhibits low-order polynomial growth concerning the highest number of actions in sparse coordination graphs. ? 2024 IEEE.
参考文献:
正在载入数据...
