登录    注册    忘记密码


SEN: A subword-based ensemble network for Chinese historical entity extraction  ( SCI-EXPANDED收录 EI收录)  


英文题名:SEN: A subword-based ensemble network for Chinese historical entity extraction

作者:Yan, Chengxi[1,2];Wang, Ruojia[3];Fang, Xiaoke[4]

第一作者:Yan, Chengxi

通讯作者:Wang, RJ[1]

机构:[1]Renmin Univ China, Sch Informat Resource Management, Beijing, Peoples R China;[2]Res Ctr Digital Humanities RUC, Beijing, Peoples R China;[3]Beijing Univ Chinese Med, Sch Management, Beijing, Peoples R China;[4]Beijing Union Univ, Coll Appl Arts & Sci, Beijing, Peoples R China

第一机构:Renmin Univ China, Sch Informat Resource Management, Beijing, Peoples R China

通讯机构:[1]corresponding author), Beijing Univ Chinese Med, Sch Management, Beijing, Peoples R China.




基金:Acknowledgments This research is supported by China Postdoctoral Science Foundation (No. 2021M703564) and National Social Science Foundation of China (No. 18CTQ041).


外文关键词:Named entity recognition; Entity extraction; Neural network; Subword; Chinese history

摘要:Understanding various historical entity information (e.g., persons, locations, and time) plays a very important role in reasoning about the developments of historical events. With the increasing concern about the fields of digital humanities and natural language processing, named entity recognition (NER) provides a feasible solution for automatically extracting these entities from historical texts, especially in Chinese historical research. However, previous approaches are domain-specific, ineffective with relatively low accuracy, and non-interpretable, which hinders the development of NER in Chinese history. In this paper, we propose a new hybrid deep learning model called "subword-based ensemble network" (SEN), by incorporating subword information and a novel attention fusion mechanism. The experiments on a massive self-built Chinese historical corpus CMAG show that SEN has achieved the best with 93.87% for F1-micro and 89.70% for F1-macro, compared with other advanced models. Further investigation reveals that SEN has a strong generalization ability of NER on Chinese historical texts, which is not only relatively insensitive to the categories with fewer annotation labels (e.g., OFI) but can also accurately capture diverse local and global semantic relations. Our research demonstrates the effectiveness of the integration of subword information and attention fusion, which provides an inspiring solution for the practical use of entity extraction in the Chinese historical domain.



版权所有©北京联合大学 重庆维普资讯有限公司 渝B2-20050021-8 
渝公网安备 50019002500408号 违法和不良信息举报中心