Topic detection or tracking method for network text big data

A topic detection and big data technology, applied in the intersection of big data analysis and machine learning, can solve the problem of not forming a research framework and model in the field of topic detection or tracking, achieving strong scalability, improving throughput, and high application value Effect

Active Publication Date: 2015-03-25
WUHAN SHUWEI TECH
View PDF7 Cites 32 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Although these methods can improve the performance of the TDT system to a certain extent, they are only a supplement and amendment to the tra

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Topic detection or tracking method for network text big data
  • Topic detection or tracking method for network text big data
  • Topic detection or tracking method for network text big data

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0050] In order to make the object, technical solution and advantages of the present invention clearer, the present invention will be further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present invention, not to limit the present invention. In addition, technologies involved in various embodiments of the present invention described below. Features can be combined with each other as long as they do not conflict with each other.

[0051] Below at first technical terms of the present invention are explained and illustrated:

[0052] Laplacian matrix: the difference between the degree matrix and the adjacency matrix, the degree matrix is ​​a diagonal matrix, which contains the degree of each vertex; the Laplacian matrix is ​​a semi-positive definite matrix, and the number of occurrences of 0 in the eigenvalue is The number of connected re...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a topic detection or tracking method for network text big data. The basic thought includes the following steps that a graph model of keywords and a corresponding adjacent matrix are built by detecting the keywords occurring in different files, the graph model and the adjacent matrix are combined with spectral clustering to provide a new topic detection model, probability distribution of each file about a topic is calculated, when a new file reaches, similarity between the new file and an attributed set represented by a historical topic is calculated, automatic detection or tracking of the topic is achieved, and a distributed method is achieved through a Map Reduce programming model. The method is characterized in that the topic is displayed and mined through the cooccurrence relation of the keywords rather than an implicit expression, the big data are calculated in a distributed mode, data information in the internet is clustered, expansibility is higher, the quantity of the data capable of being processed is larger, and the throughput rate is greatly increased.

Description

technical field [0001] The invention belongs to the technical field intersecting big data analysis and machine learning, and more specifically relates to a topic detection or tracking method for text big data. Background technique [0002] With the rapid expansion of Internet information, the amount of information has grown exponentially, and the vast amount of network data is far beyond the control of human beings. It is difficult for users to quickly extract the information they need from a large number of information. Topic Detection and Tracking (TDT) is an information processing technology for automatic detection of new topics and follow-up of known topics for news media information flow. Since topic detection and tracking have many commonalities with natural language processing technologies such as information retrieval and data mining, and it is directly oriented to news corpus with emergent characteristics, it has gradually become a research hotspot in big data analy...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
CPCG06F16/35
Inventor 邹复好周可范瑞郑胜张胜陈进才李春花
Owner WUHAN SHUWEI TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products