详细信息
基于多特征多分类器集成的专利自动分类研究
Patent Classification Based on Multi-feature and Multi-classifier Integration
文献类型:期刊文献
中文题名:基于多特征多分类器集成的专利自动分类研究
英文题名:Patent Classification Based on Multi-feature and Multi-classifier Integration
作者:贾杉杉[1];刘畅[2];孙连英[3];刘小安[1];彭涛[2]
第一作者:贾杉杉
机构:[1]北京联合大学智慧城市学院;[2]北京联合大学机器人学院;[3]北京联合大学城市轨道交通与物流学院
第一机构:北京联合大学智慧城市学院
年份:2017
卷号:1
期号:8
起止页码:76-84
中文期刊名:数据分析与知识发现
外文期刊名:Data Analysis and Knowledge Discovery
收录:CSTPCD;;国家哲学社会科学学术期刊数据库;CSSCI:【CSSCI2017_2018】;CSCD:【CSCD2017_2018】;
基金:国家重点研发计划项目"公共安全风险防控与应急技术装备"(项目编号:2016YFC0802107);北京市教育委员会科技计划面上项目(项目编号:SQKM201411417013)的研究成果之一
语种:中文
中文关键词:专利分类;段落向量;主题向量;分类器集成
外文关键词:Patent Classification; Document Vector; Topic Model Vector; Classifier Integration
摘要:【目的】为了准确地给专利申请书分配IPC分类号,本文提出一种基于多特征多分类器集成的专利自动分类方法。【方法】使用从专利申请书中提取的全词典TFIDF特征、信息增益词典TFIDF特征、段落向量特征、主题模型向量特征,分别训练朴素贝叶斯、支持向量机、AdaBoost分类器,以此构建特征–类别矩阵,并结合F1权重矩阵集成,获得最终IPC预测分类号。【结果】对2014年–2016年"发动机或泵"领域的10个小类进行分类,使用Top Prediction、All Categories和Two Guesses三种评估方法得到准确率分别为:78.9%、80.1%、91.2%。【局限】训练仅仅使用了2014年–2016年共三年的专利数据,数据规模有限。【结论】在"发动机或泵"领域,本文方法能够有效地提高专利文本分类的准确率。
[Objective] This paper aims to automatically allocate correct IPC to patent applications with the help of multi-feature and multi-classifier integration method. [Methods] First, we extracted the TFIDF features of all dictionaries and information gains, as well as the vector features of document and topic models from patent applications. Then, we used the collected data to train the NB, SVM, and Ada Boost classifiers. Finally, we established the feature-class matrix and predicted the final IPC with the F1 weight matrix. [Results] We examined our new method with 10 patent classes from 2014 to 2016 in the field of engine and pump. The accuracy of top prediction, all categories, and two guesses were 78.9%, 80.1% and 91.2% respectively. [Limitations] The size of training corpus is limited, which only includes 3 years patent data. [Conclusions] The proposed method could effectively improve the accuracy of patent classification in the field of engine and pump.
参考文献:
正在载入数据...