详细信息
CVaR-based Thompson Sampling for Factored Multi-agent Multi-Armed Risk-aware Bandit ( EI收录)
文献类型:会议论文
英文题名:CVaR-based Thompson Sampling for Factored Multi-agent Multi-Armed Risk-aware Bandit
作者:Li, Jiahong[1]; Tian, Huiyu[2]
第一作者:李佳洪
机构:[1] College of Robotics, Beijing Union University, Beijing, China; [2] Department of Science and Technology, Beijing Normal University - Hong Kong Baptist University United International College, Zhuhai, China
第一机构:北京联合大学机器人学院
会议论文集:Proceedings - 2024 China Automation Congress, CAC 2024
会议日期:November 1, 2024 - November 3, 2024
会议地点:Qingdao, China
语种:英文
外文关键词:Adaptive boosting - Gaussian distribution - Polynomial approximation - Risk assessment - Risk management - Risk perception
摘要:Risk-aware online learning and bandit algorithms have been successfully applied in scenarios where optimizing the expected reward is not sufficient to achieve desired outcomes due to the presence of uncertainty. However, distributed decision-making under uncertainty in multi-agent systems has become increasingly important for addressing real-world tasks. Our research focuses on developing a distributed risk-aware decision-making approach that optimizes a shared objective in a network of loosely coupled agents. Specifically, we address the problem of regret minimization in the multi-agent multi-armed risk-aware bandit (MAMARAB) framework, where each arm follows a Gaussian distribution with bounded variances and is evaluated using Conditional Value at Risk (CVaR). To tackle this problem, we propose a novel CVaR-based multi-agent Thompson sampling (CVaR-MATS) algorithm, which selects actions based on the CVaR measure to balance exploration and exploitation while taking into account the risk associated with each arm. Moreover, to provide theoretical guarantees, we derive a regret bound for the CVaR-MATS algorithm that is sub-linear in time and exhibits low-order polynomial growth concerning the highest number of actions in sparse coordination graphs. ? 2024 IEEE.
参考文献:
正在载入数据...