详细信息

SEN: A subword-based ensemble network for Chinese historical entity extraction ( SCI-EXPANDED收录 EI收录)

文献类型：期刊文献

英文题名：SEN: A subword-based ensemble network for Chinese historical entity extraction

作者：Yan, Chengxi[1,2];Wang, Ruojia[3];Fang, Xiaoke[4]

第一作者：Yan, Chengxi

通讯作者：Wang, RJ[1]

机构：[1]Renmin Univ China, Sch Informat Resource Management, Beijing, Peoples R China;[2]Res Ctr Digital Humanities RUC, Beijing, Peoples R China;[3]Beijing Univ Chinese Med, Sch Management, Beijing, Peoples R China;[4]Beijing Union Univ, Coll Appl Arts & Sci, Beijing, Peoples R China

第一机构：Renmin Univ China, Sch Informat Resource Management, Beijing, Peoples R China

通讯机构：[1]corresponding author), Beijing Univ Chinese Med, Sch Management, Beijing, Peoples R China.

年份：0

外文期刊名：NATURAL LANGUAGE ENGINEERING

收录：;EI(收录号：20233214490816);Scopus(收录号：2-s2.0-85166485785);WOS:【SSCI(收录号:WOS:000901362700001)，SCI-EXPANDED(收录号:WOS:000901362700001)，A&HCI(收录号:WOS:000901362700001)】；

基金：Acknowledgments This research is supported by China Postdoctoral Science Foundation (No. 2021M703564) and National Social Science Foundation of China (No. 18CTQ041).

语种：英文

外文关键词：Named entity recognition; Entity extraction; Neural network; Subword; Chinese history

摘要：Understanding various historical entity information (e.g., persons, locations, and time) plays a very important role in reasoning about the developments of historical events. With the increasing concern about the fields of digital humanities and natural language processing, named entity recognition (NER) provides a feasible solution for automatically extracting these entities from historical texts, especially in Chinese historical research. However, previous approaches are domain-specific, ineffective with relatively low accuracy, and non-interpretable, which hinders the development of NER in Chinese history. In this paper, we propose a new hybrid deep learning model called "subword-based ensemble network" (SEN), by incorporating subword information and a novel attention fusion mechanism. The experiments on a massive self-built Chinese historical corpus CMAG show that SEN has achieved the best with 93.87% for F1-micro and 89.70% for F1-macro, compared with other advanced models. Further investigation reveals that SEN has a strong generalization ability of NER on Chinese historical texts, which is not only relatively insensitive to the categories with fewer annotation labels (e.g., OFI) but can also accurately capture diverse local and global semantic relations. Our research demonstrates the effectiveness of the integration of subword information and attention fusion, which provides an inspiring solution for the practical use of entity extraction in the Chinese historical domain.

参考文献：

正在载入数据...

北京联合大学机构知识库

详细信息

SEN: A subword-based ensemble network for Chinese historical entity extraction ( SCI-EXPANDED收录 EI收录)

参考文献：