详细信息
文献类型:期刊文献
中文题名:一种挖掘压缩序列模式的有效算法
英文题名:An Efficient Algorithm for Mining Compressed Sequential Patterns
作者:童咏昕[1];张媛媛[2];袁玫[3];马世龙[1];余丹[1];赵莉[1]
第一作者:童咏昕
通讯作者:Tong, Y.
机构:[1]北京航空航天大学软件开发环境国家重点实验室;[2]电信科学技术研究院;[3]北京联合大学信息学院
第一机构:北京航空航天大学软件开发环境国家重点实验室,北京100191
年份:2010
期号:1
起止页码:72-80
中文期刊名:计算机研究与发展
外文期刊名:Journal of Computer Research and Development
收录:CSTPCD;;EI(收录号:20101512841039);Scopus(收录号:2-s2.0-77950591095);北大核心:【北大核心2008】;CSCD:【CSCD2011_2012】;
基金:国家"九七三"重点基础研究发展计划基金项目(2005CB321902);北京市教委科技计划基金项目(KM200911417003)
语种:中文
中文关键词:挖掘序列模式;压缩;频繁模式挖掘;关联规则;数据挖掘
外文关键词:mining sequential pattern compression frequent pattern mining association rule data mining
摘要:从序列数据库中挖掘频繁序列模式是数据挖掘领域的一个中心研究主题,而且该领域已经提出和研究了各种有效的序列模式挖掘算法.由于在挖掘过程中会产生大量的频繁序列模式,最近许多研究者已经不再聚焦于序列模式挖掘算法的效率,而更关注于如何让用户更容易地理解序列模式的结果集.受压缩频繁项集思想的启发,提出了一种CFSP(compressing frequent sequential patterns)算法,其可挖掘出少量有代表性的序列模式来表达全部频繁序列模式的信息,并且清除了大量的冗余序列模式.CFSP是一种two-steps的算法:在第1步,其获得了全部闭序列模式作为有代表性序列模式的候选集,与此同时还得到大多数的有代表性模式;在第2步,该算法只花费了少量的时间去发现剩余的有代表性序列模式.一个采用真实数据集与模拟数据集的实验研究也证明了CFSP算法具有高效性.
Mining frequent sequential patterns from sequence databases has been a central research topic in data mining and various efficient algorithms for mining sequential patterns have been proposed and studied. Recently,many researchers have not focused on the efficiency of sequential patterns mining algorithms,but have paid attention to how to make users understand the result set of sequential patterns easily,due to the huge number of frequent sequential patterns generated by the mining process. In this paper,the problem of compressing frequent sequential patterns is studied. Inspired by the ideas of compressing frequent itemsets,an algorithm,CFSP (compressing frequent sequential patterns),is developed to mine a few representative sequential patterns to express all the information of all frequent sequential patterns and eliminate a large number of redundant sequential patterns. The CFSP adopts a two-steps approach: in the first step,all closed sequential patterns as the candidate set of representative sequential patterns are obtained,and at the same time most of the representative sequential patterns are obtained;in the second step,finding the remaining representative sequential patterns takes only a little time. An empirical study with both real and synthetic data sets proves that the CFSP has good performance.
参考文献:
正在载入数据...