Topic detecting device and topic detecting method based on distributed multistage cluster

A topic detection and distributed technology, applied in special data processing applications, instruments, electrical digital data processing, etc., can solve the problems that the detection effect needs to be improved, and the processing speed is difficult to guarantee.

Inactive Publication Date: 2012-12-19
人民搜索网络股份公司
View PDF2 Cites 57 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] However, since these methods process each document in a serial manner, once faced with a large amount of data, it is difficult to guarantee a practical processing if the clustering algorithm is not selected with a less complex but less effective algorithm. speed
Moreover, since no measures to effect

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Topic detecting device and topic detecting method based on distributed multistage cluster
  • Topic detecting device and topic detecting method based on distributed multistage cluster
  • Topic detecting device and topic detecting method based on distributed multistage cluster

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0047] The device and method for topic detection based on distributed multi-level clustering of the present invention will be further described in detail below in conjunction with the accompanying drawings and embodiments of the present invention.

[0048] figure 1 It is an overall flowchart of the topic detection method based on distributed multi-level clustering of the present invention, such as figure 1 As shown, the process is executed periodically and mainly includes the following steps:

[0049] Step S1: news collection, collecting network news in real time from various websites, and extracting structured information.

[0050] Step S2: News categorization, automatically classify the newly collected news in this period according to their subject categories, and distribute them to various channels.

[0051] Step S3: Carry out multi-level clustering on each channel in parallel. In each channel, the features of the new news entering the channel in this period are extracte...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a topic detecting device and a topic detecting method based on distributed multistage cluster. The topic detecting device mainly comprises a news acquiring module, a news classifying module, a topic detecting module, a topic integrating module and a topic displaying module. The topic detecting method is characterized by comprising the steps of A, acquiring news; B, classifying the newly acquired news; C, performing multistage cluster for various channels concurrently; and D, computing hotness of all topics, and screening hot topics of a total system and hot topics in each channel. By the topic detecting device and the topic detecting method, a sharp contradiction between a detecting effect and time cost in topic detection under the condition of quick updating of a large number of files in an internet environment can be solved.

Description

technical field [0001] The invention relates to network information analysis, text classification and text clustering technologies in text information processing, in particular to a topic detection device and method based on distributed multi-level clustering. Background technique [0002] With the rapid development of the Internet, the information on the Internet has become more and more diverse and abundant. At the same time, the social influence of Internet public opinion has continued to increase. Many hot social events are disclosed and disseminated in the Internet at the first time. Therefore, its important value is becoming more and more obvious. In the Internet environment, there are a large number of documents in the form of natural language, including news, blogs, forum posts, and emerging microblogs, etc. These documents provide the most basic data source for discovering hot topics. [0003] The Topic Detection and Tracking project (TDT, Topic Detection and Track...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
Inventor 杨青李德聪
Owner 人民搜索网络股份公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products