Automatic mining system and method of news events in large-scale data

A large-scale data and automatic mining technology, applied in the direction of electrical digital data processing, special data processing applications, instruments, etc., can solve the problems of unfavorable massive data collection expansion and high complexity of hierarchical clustering calculations

Inactive Publication Date: 2013-04-03
人民搜索网络股份公司
View PDF2 Cites 11 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, due to the characteristics of the algorithm itself, hierarchical clustering has a high computational complexity, which is not conducive to expanding on massive data sets.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Automatic mining system and method of news events in large-scale data

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0028] The automatic excavating system and method thereof of the present invention will be further described in detail below in conjunction with the accompanying drawings and embodiments of the present invention.

[0029] The present invention aims at how to timely and accurately dig out news events in a large-scale news data information environment, and proposes an automatic clustering system, which mainly utilizes two hierarchical clustering methods with different granularities for data processing.

[0030] The current event mining method for news data is a one-time processing method that classifies or clusters all news data as an overall input. This method has two drawbacks: First, this centralized processing method has poor scalability in terms of data scale. The processing time of such methods on large-scale news data sets will become very long, which does not meet the needs of news event mining timeliness. Second, a round of centralized processing is not conducive to th...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses an automatic mining system and method of news events in large-scale data. The automatic mining system comprises a receiving module, a clustering processing module, an event merging module and a cache processing module, wherein the receiving module is used for receiving news data pushed to the event mining system in a time slice, and sending the news data accumulated in the previous time slice to the clustering processing module, and the clustering processing module is used for carrying out hierarchical clustering processing of the received news data according to titles of news or the degree of similarity of body texts in order to mine the news group with the same event attribute; and the event merging module is used for the merging processing of new events formed by clustering processing and old news formed in history according to the degree of similarity between the events, and the newly formed events and the modified historical events are sent to the catch module for cache processing. With the system and the method, the massive news data can be automatically mined, and the timeliness and accuracy requirements are met.

Description

technical field [0001] The invention relates to the technical field of data mining, in particular to an automatic mining system and method for news events under large-scale data. Background technique [0002] With the vigorous development of Internet technology, news reports on the Internet have also shown explosive growth. How to quickly dig out the needed information from the massive news information is a problem worth studying. [0003] Existing hierarchical clustering is a process of hierarchically merging (or decomposing) a given data set. In the process of data processing, the order of merging will be determined according to the degree of similarity between the data. Compared with other clustering or classification methods, the hierarchical clustering method has the advantage that the hierarchical clustering method does not need to know the number of categories that the data will be divided into in advance, and is more suitable for the fact that the number of news e...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
Inventor 付万宇黄丛蕊薛飞徐海瑞杨之光杨青
Owner 人民搜索网络股份公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products