详细信息
自适应遗传算法在主题爬虫搜索策略中的应用研究
Research on Adaptive Genetic Algorithm in Application of Focused Crawler Search Strategy
文献类型:期刊文献
中文题名:自适应遗传算法在主题爬虫搜索策略中的应用研究
英文题名:Research on Adaptive Genetic Algorithm in Application of Focused Crawler Search Strategy
作者:荆文鹏[1];王育坚[1];董伟伟[1]
第一作者:荆文鹏
机构:[1]北京联合大学信息学院
第一机构:北京联合大学智慧城市学院
年份:2016
卷号:43
期号:8
起止页码:254-257
中文期刊名:计算机科学
外文期刊名:Computer Science
收录:CSTPCD;;北大核心:【北大核心2014】;CSCD:【CSCD_E2015_2016】;
基金:国家自然科学基金项目:基于超图形XGML的图像半结构化研究(61271369)资助
语种:中文
中文关键词:主题爬虫;重要度;遗传算法;遗传算子;适应度函数
外文关键词:Focused crawler, Important degree, Genetic algorithm, Genetic operators, Fitness function
摘要:如何提高爬虫覆盖率和准确率是主题爬虫的研究热点之一。目前大多采用最佳优先搜索策略,针对该类主题爬虫易陷入局部最优的不足,设计结合遗传算法的主题爬虫搜索策略,并设计动态适应度函数和遗传算子使得爬虫具有一定的自适应性。与其他搜索策略以及结合非自适应遗传算法的搜索策略进行了比较,结果表明该算法能够在一定程度上提高爬虫性能。
How to design the crawler search strategy to improve the crawler's coverage and accuracy has become a hot research point in the focused crawler. Mostly crawler uses best-first search algorithm. Based on the focused crawler which uses this search strategy will easily plunge into local optimum, we combined genetic algorithm with focused crawler search strategy. We set dynamic fitness function and genetic-operators to make the crawlers have certain adap- tive searching adaptability. By comparing with those crawlers which use the other search strategy or which combine with traditional genetic algorithm search strategy, the experimental results show that this algorithm can partly improve the crawler search ability.
参考文献:
正在载入数据...