详细信息
On Feature Selection and Its Application to Twenty Newsgroups Text Classification
文献类型:会议论文
英文题名:On Feature Selection and Its Application to Twenty Newsgroups Text Classification
作者:Du, Mei[1];Liang, Yan[2];Zhao, Lu[2]
第一作者:杜梅
通讯作者:Du, M[1]
机构:[1]Beijing Union Univ, Coll Management, Beijing, Peoples R China;[2]SUNY Stony Brook, Dept Appl Math & Stat, Stony Brook, NY 11794 USA
第一机构:北京联合大学管理学院
通讯机构:[1]corresponding author), Beijing Union Univ, Coll Management, Beijing, Peoples R China.|[1141755]北京联合大学管理学院;[11417]北京联合大学;
会议论文集:2nd International Conference on Education and Management Science (ICEMS)
会议日期:MAY 28-29, 2016
会议地点:Beijing, PEOPLES R CHINA
语种:英文
外文关键词:Text classification; Feature selection; Dimensionality reduction
摘要:Feature selection is usually used to deal with the high dimensionality of feature space in text classification. In our paper, we consider three criteria of feature selection: document frequency, mutual information and Chi-Square statistic. The general selection process is done in a greedy manner, at each step, the new feature with highest score (e. g. Chi-Square statistic) condition on a subset of features already selected will be selected. We propose to use simulated annealing algorithm to select new feature at each step. We will explore the relationship between results of greedy and simulated annealing algorithms and to what extent can we reduce the feature space without significantly damage the classification accuracy. Our experimentation results shows that simulated annealing algorithm is more effective that greedy algorithm and with very little amount of features (about 0.1%) we can achieve about 60% of test accuracy using entire feature space.
参考文献:
正在载入数据...