登录    注册    忘记密码

详细信息

基于HMM的京剧机构命名实体识别算法    

Algorithm of Beijing Opera Organization Names Entity Recognition Based on HMM

文献类型:期刊文献

中文题名:基于HMM的京剧机构命名实体识别算法

英文题名:Algorithm of Beijing Opera Organization Names Entity Recognition Based on HMM

作者:乐娟[1,2];赵玺[3]

第一作者:乐娟

机构:[1]北京理工大学计算机学院;[2]北京戏曲艺术职业学院;[3]北京联合大学师范学院

第一机构:北京理工大学计算机学院,北京100081

年份:2013

卷号:39

期号:6

起止页码:266-271

中文期刊名:计算机工程

外文期刊名:Computer Engineering

收录:CSTPCD;;Scopus;CSCD:【CSCD2013_2014】;

基金:北京市优秀人才培养计划基金资助项目(2012D002002000001);北京市职业院校教师素质提高工程基金资助项目

语种:中文

中文关键词:开放领域;命名实体识别;隐马尔科夫模型;Viterbi算法;规则树

外文关键词:open-domain; Named Entity Recognition(NER); Hidden Markov ModeI(HMM); Viterbi algorithm; rule tree

摘要:针对机构命名实体识别效率低的问题,提出一种基于隐马尔科夫模型(HMM)的京剧机构命名实体识别算法。利用HMM模型标注文本切分结果的词性消除歧义,通过Viterbi算法计算某种分词结果所对应的可能性最大的词性序列。根据定制的名称识别规则,借助机构前缀词库、后缀词库获得机构名称左右边界,通过自动机算法识别语料中的机构命名实体,并将新词加载到分词词典中。针对京剧领域语料进行开放测试验证,结果表明,该算法的识别正确率可达到99%。
Aiming at the inefficiency of organization named entity recognition, this paper proposes an algorithm of Beijing opera organization Named Entity Recognition(NER) based on Hidden Markov Model(HMM). It uses HMM to take part-of-speech tagging and solve the problem of disambiguation of the words. The Viterbi algorithm is used to calculate the maximum probability tagging sequence to the sentence segmentation. It defines the rules to recognize the organization names. The left and right boundary of the organization is identified with the help of organization postfix lexicon. The new names in corpus are recognized by automatic algorithm and be loaded into the dictionary. This paper takes the test in open materials, the result shows the recognition accuracy can achieve 99%.

参考文献:

正在载入数据...

版权所有©北京联合大学 重庆维普资讯有限公司 渝B2-20050021-8 
渝公网安备 50019002500408号 违法和不良信息举报中心