登录    注册    忘记密码

详细信息

On Feature Selection and Its Application to Twenty Newsgroups Text Classification    

文献类型:会议论文

英文题名:On Feature Selection and Its Application to Twenty Newsgroups Text Classification

作者:Du, Mei[1];Liang, Yan[2];Zhao, Lu[2]

第一作者:杜梅

通讯作者:Du, M[1]

机构:[1]Beijing Union Univ, Coll Management, Beijing, Peoples R China;[2]SUNY Stony Brook, Dept Appl Math & Stat, Stony Brook, NY 11794 USA

第一机构:北京联合大学管理学院

通讯机构:[1]corresponding author), Beijing Union Univ, Coll Management, Beijing, Peoples R China.|[1141755]北京联合大学管理学院;[11417]北京联合大学;

会议论文集:2nd International Conference on Education and Management Science (ICEMS)

会议日期:MAY 28-29, 2016

会议地点:Beijing, PEOPLES R CHINA

语种:英文

外文关键词:Text classification; Feature selection; Dimensionality reduction

摘要:Feature selection is usually used to deal with the high dimensionality of feature space in text classification. In our paper, we consider three criteria of feature selection: document frequency, mutual information and Chi-Square statistic. The general selection process is done in a greedy manner, at each step, the new feature with highest score (e. g. Chi-Square statistic) condition on a subset of features already selected will be selected. We propose to use simulated annealing algorithm to select new feature at each step. We will explore the relationship between results of greedy and simulated annealing algorithms and to what extent can we reduce the feature space without significantly damage the classification accuracy. Our experimentation results shows that simulated annealing algorithm is more effective that greedy algorithm and with very little amount of features (about 0.1%) we can achieve about 60% of test accuracy using entire feature space.

参考文献:

正在载入数据...

版权所有©北京联合大学 重庆维普资讯有限公司 渝B2-20050021-8 
渝公网安备 50019002500408号 违法和不良信息举报中心