详细信息
中文专利中有标记并列结构的自动识别研究
Research on Automatic Identification of Marked Parallel Structures in Chinese Patent
文献类型:期刊文献
中文题名:中文专利中有标记并列结构的自动识别研究
英文题名:Research on Automatic Identification of Marked Parallel Structures in Chinese Patent
作者:刘小蝶[1];朱筠[2];晋耀红[2]
第一作者:刘小蝶
机构:[1]北京联合大学国际交流学院;[2]北京师范大学中文信息处理研究所
第一机构:北京联合大学国际交流学院
年份:2018
卷号:44
期号:6
起止页码:162-168
中文期刊名:计算机工程
外文期刊名:Computer Engineering
收录:CSTPCD;;Scopus;北大核心:【北大核心2017】;CSCD:【CSCD_E2017_2018】;
基金:国家高技术研究发展计划项目"海量文本多层次知识表示及中文文本理解应用系统研制"(2012AA011104);国家语委"十二五"科研规划项目"语言资源建设规划研究"(YB125-124)
语种:中文
中文关键词:基于规则;边界感知;并列结构;机器翻译;专利文献
外文关键词:rule-based;boundary perception;parallel structure;machine translation;patent documentation;
摘要:中文专利中名词性有标记并列结构分布广泛、结构复杂,现有的识别技术仅能运用有限的特征识别某些简单类型的并列结构,总体识别效果不佳。为此,提出一种基于边界感知原则的识别方法。在概念层次网络(HNC)理论的基础上,从数量、层级、语义类型、语义特征、干扰特征、结构特征、外部环境和位置特征8个维度对并列结构进行标注,考察并总结语义特征、结构特征和外部词特征,制定217条形式化规则,并将其融合到已有的HNC翻译系统中。测试结果表明,与Google在线翻译系统相比,该方法对有标记并列结构的识别正确率较高。
The Coordination with Overt Conjunctions(COCs)in the Chinese patent literature are complex and widely distributed.The existing recognition technology can only use limited features to identify some simple types of parallel structures,and the recognition results are not very good as a whole.A method based on boundary-perceiving principles for recognizing COCs is introduced.Under the guidance of the HNC theory,COCs are annotated in the eight aspects:number,level,semantic type,semantic feature,interference,structural feature,contextual words and boundary position.The semantic characteristics,structural characteristics and contextual information are investigated and summarized;and 217 formal rules are set up and integrated into a HNC translation system.In contrast to Google Translate,the open experiment shows that this new method has better accuracy rate.
参考文献:
正在载入数据...