登录    注册    忘记密码

详细信息

基于单边选择链和样本分布密度融合机制的非平衡数据挖掘方法  ( EI收录)  

A Data Mining Method for Imbalanced Datasets Based on One-Sided Link and Distribution Density of Instances

文献类型:期刊文献

中文题名:基于单边选择链和样本分布密度融合机制的非平衡数据挖掘方法

英文题名:A Data Mining Method for Imbalanced Datasets Based on One-Sided Link and Distribution Density of Instances

作者:翟云[1,2];王树鹏[3];马楠[4];杨炳儒[2];张德政[2]

第一作者:翟云

通讯作者:Wang, Shu-Peng

机构:[1]国家行政学院电子政务研究中心;[2]北京科技大学计算机与通信工程学院;[3]中国科学院信息工程研究所;[4]北京联合大学信息学院

第一机构:国家行政学院电子政务研究中心,北京100089

年份:2014

卷号:42

期号:7

起止页码:1311-1319

中文期刊名:电子学报

外文期刊名:Acta Electronica Sinica

收录:CSTPCD;;EI(收录号:20144200113026);Scopus(收录号:2-s2.0-84907876954);北大核心:【北大核心2011】;CSCD:【CSCD2013_2014】;

基金:国家自然科学基金(No.61300078;No.61271275);国家行政学院科研招标课题(No.2012ZBKT016)

语种:中文

中文关键词:非平衡数据分类;单边选择链;分布密度;重采样

外文关键词:classification in imbalanced datasets; one-sided link;distribution density; resample

摘要:非平衡数据集分类问题是机器学习领域的重大挑战性难题.针对该难题,传统的少数类样本合成技术(Synthetic Minority Over-Sampling Technique,SMOTE)已成为一种有力手段并得到广泛采用.但在新样本生成过程中,SMOTE利用所有少数类样本合成新样本,由此产生过拟合瓶颈.为更好地解决该问题,提出了一种基于单边选择链和样本分布密度的非平衡数据挖掘新方法(One-Sided Link&Distribution Density-SMOTE,OSLDD-SMOTE).OSLDDSMOTE通过单边选择链遴选出处于分类边界的少数类样本,根据这些样本的动态分布密度生成新样本.进而分析了样本合成度对节点数目和对少数类精度的影响;基于G-mean、F-measure和AUC三个指标综合比较了OSLDD-SMOTE与其他同类方法的分类性能.实验结果表明,OSLDD-SMOTE有效提高了少数类样本的分类准确率.
Classification in imbalanced datasets poses a great challenge to machine learning region, where the synthetic mi- nority over-sampling technique(SMOTE) has become a powerful means and widely adopted as an effective method. But in generat- ing new instances, SMOTE uses all instances in minority class such that it takes with over-generalization. To better solve the prob- lem,a data mining method for imbalanced datasets based on one-sided link and distribution density of the minority( OSLDD- SMOTE)is proposed in this paper. OSLDD-SMOTE firstly selects the minority near the classification boundary using the one-sided link, then generates new instances with SMOTE based on the dynamic distribution density of these instances. Effects of synthetic de- gree on new generated instances and accuracy of the minority are respectively compared with the OSLDD-SMOTE, SMOTE,Bor- derlinaSMOTE and Surrounding-SMOTE method. Furthermore, from the simulation results with 8 UCI datasets, our proposed method has the most accurate and robust performance on the G-mean,F-measure and AUC metrics.

参考文献:

正在载入数据...

版权所有©北京联合大学 重庆维普资讯有限公司 渝B2-20050021-8 
渝公网安备 50019002500408号 违法和不良信息举报中心