详细信息

CVaR-based Thompson Sampling for Factored Multi-agent Multi-Armed Risk-aware Bandit ( CPCI-S收录)

文献类型：会议论文

英文题名：CVaR-based Thompson Sampling for Factored Multi-agent Multi-Armed Risk-aware Bandit

作者：Li, Jiahong[1];Tian, Huiyu[2]

第一作者：李佳洪

通讯作者：Li, JH[1]

机构：[1]Beijing Union Univ, Coll Robot, Beijing, Peoples R China;[2]Beijing Normal Univ, Dept Sci & Technol, Hong Kong Baptist Univ United Int Coll, Beijing, Peoples R China

第一机构：北京联合大学机器人学院

通讯机构：[1]corresponding author), Beijing Union Univ, Coll Robot, Beijing, Peoples R China.|[1141739]北京联合大学机器人学院;[11417]北京联合大学;

会议论文集：2024 China Automation Congress

会议日期：NOV 01-03, 2024

会议地点：Qingdao, PEOPLES R CHINA

语种：英文

外文关键词：Multi-armed risk-aware bandit; Multi-agent system; Conditional Value at Risk; Thompson sampling

摘要：Risk-aware online learning and bandit algorithms have been successfully applied in scenarios where optimizing the expected reward is not sufficient to achieve desired outcomes due to the presence of uncertainty. However, distributed decision-making under uncertainty in multi-agent systems has become increasingly important for addressing real-world tasks. Our research focuses on developing a distributed risk-aware decision-making approach that optimizes a shared objective in a network of loosely coupled agents. Specifically, we address the problem of regret minimization in the multi-agent multi-armed risk-aware bandit (MAMARAB) framework, where each arm follows a Gaussian distribution with bounded variances and is evaluated using Conditional Value at Risk (CVaR). To tackle this problem, we propose a novel CVaR-based multi-agent Thompson sampling (CVaR-MATS) algorithm, which selects actions based on the CVaR measure to balance exploration and exploitation while taking into account the risk associated with each arm. Moreover, to provide theoretical guarantees, we derive a regret hound for the CVaR-MATS algorithm that is sub-linear in time and exhibits low-order polynomial growth concerning the highest number of actions in sparse coordination graphs.

参考文献：

正在载入数据...

北京联合大学机构知识库

详细信息

CVaR-based Thompson Sampling for Factored Multi-agent Multi-Armed Risk-aware Bandit ( CPCI-S收录)

参考文献：