详细信息
Research on Key Technology of Network Information Extraction Oriented to Web Topic Detection for Big Data ( SCI-EXPANDED收录)
文献类型:期刊文献
英文题名:Research on Key Technology of Network Information Extraction Oriented to Web Topic Detection for Big Data
作者:Chen, Mo[1]
第一作者:陈默
通讯作者:Chen, M[1]
机构:[1]Beijing Union Univ, Business Coll, A3 Yanjingdongli, Beijing 100025, Peoples R China
第一机构:北京联合大学商务学院
通讯机构:[1]corresponding author), Beijing Union Univ, Business Coll, A3 Yanjingdongli, Beijing 100025, Peoples R China.|[1141721]北京联合大学商务学院;[11417]北京联合大学;
年份:2026
卷号:23
期号:2
起止页码:775-800
外文期刊名:COMPUTER SCIENCE AND INFORMATION SYSTEMS
收录:;WOS:【SCI-EXPANDED(收录号:WOS:001760869100006)】;
基金:This paper is supported by General Project of Science and Technology Plan of Beijing Municipal Education Commission under Grant Nos. KM202011417011, Research Project on Graduate Education Science at Beijing Union University in 2025 under Grant Nos. YK202502, Support Project of High-Level Teachers in Beijing Municipal Universities in the Period of 13th Five-Year Plan under Grant Nos. CIT&TCD201704072.
语种:英文
外文关键词:Incremental Network Information Extraction; Big Data; Web Topic Detection
摘要:In the context of today's big data and numerical intelligence era, this study explores an incremental network information extraction technology for Web topic detection characterized by the semi-structured or unstructured big data as important research object to promote network information detection application. This study takes Web big data as the main research object and proposes an incremental network information extraction idea for Web topic detection. In this idea, the designed algorithm of theme similarity measurement for incremental network information extraction can extract Web instances related to theme, and calculate importance of Web instances related to theme, furthermore, the designed algorithm of incremental instance extraction for Web topic detection can analyze Pattern and BasePattern according to extracted Web instance URL, and conduct segmentation for Web instance title and text content, extract keywords, which are capable of describing Web topic. Experimental results demonstrate that the framework, method, and algorithm proposed in this paper significantly outperform traditional methods in network information extraction. Particularly, the accuracy rate of extracted Web instances that are similar to the theme can reach 0.833, the F-Measure value of extracted Web instances that are similar to the theme under different threshold adjustment is close to 0.83, the accuracy rate of topic detection under the condition of determining the number of Web news instances extracted, the threshold and the parameter value is close to 0.82. The study concludes that the incremental network information extraction idea proposed in this paper is feasible, verifiable, and superior, and can play an important role in reconfiguring numerical intelligence warehouses for detecting Web topic, inferring the Web hierarchical big data propagation path.
参考文献:
正在载入数据...
