登录    注册    忘记密码

详细信息

An Effective Over-sampling Method for Imbalanced Data Sets Classification  ( SCI-EXPANDED收录 EI收录)  

文献类型:期刊文献

中文题名:An Effective Over-sampling Method for Imbalanced Data Sets Classification

英文题名:An Effective Over-sampling Method for Imbalanced Data Sets Classification

作者:Zhai Yun[2,3];Ma Nan[1,2];Ruan Da[4,5];An Bing[3]

第一作者:Zhai Yun

通讯作者:Ma, N[1]

机构:[1]Beijing Union Univ, Informat Coll, Beijing 100101, Peoples R China;[2]Univ Sci & Technol Beijing, Sch Informat Engn, Beijing 100083, Peoples R China;[3]Liaocheng Univ, Sch Comp Sci, Liaocheng 252059, Peoples R China;[4]Univ Ghent, Dept Appl Math & Comp Sci, B-9000 Ghent, Belgium;[5]CEN SCK, Belgian Nucl Res Ctr, B-2400 Mol, Belgium

第一机构:北京联合大学智慧城市学院

通讯机构:[1]corresponding author), Beijing Union Univ, Informat Coll, Beijing 100101, Peoples R China.|[1141734]北京联合大学智慧城市学院;[11417]北京联合大学;

年份:2011

卷号:20

期号:3

起止页码:489-494

中文期刊名:电子学报:英文版

外文期刊名:CHINESE JOURNAL OF ELECTRONICS

收录:CSTPCD;;EI(收录号:20113114193045);Scopus(收录号:2-s2.0-79960808954);WOS:【SCI-EXPANDED(收录号:WOS:000292996900019)】;

基金:This work is supported in part by the National Natural Science Foundation of China (No.60675030, No.60875029), Funding Project for Academic Human Resources Development (No.PHR(IHLB) 2010).

语种:英文

中文关键词:数据集;不平衡;采样方法;分类;预测精度;过采样技术;现实世界;噪声数据

外文关键词:Data mining; Classification; Imbalanced data sets; Selection strategy; Distribution density; Oversample

摘要:Imbalanced data sets in real-world applications have a majority class with normal instances and a minority class with abnormal or important instances. Learning from such data sets usually generates biased classifiers that have a higher predictive accuracy over the majority class, but a rather poorer predictive accuracy over the minority class. The Synthetic minority over-sampling technique (SMOTE) is specifically designed for learning from imbalanced data sets. This paper presents a novel approach for learning from imbalanced data sets, based on an improved SMOTE algorithm. The approach deals with noise data by a hierarchical filtering mechanism, employs a selection strategy of the minority instances and makes full use of dynamic distribution density of the minority followed by the SMOTE process. This empirical analysis of the approach showed quantitatively competitive with SMOTE and series of its improved algorithm in terms of the receiver operating characteristic curve when applied to several highly and moderately imbalanced data sets.

参考文献:

正在载入数据...

版权所有©北京联合大学 重庆维普资讯有限公司 渝B2-20050021-8 
渝公网安备 50019002500408号 违法和不良信息举报中心