Detection method of network sudden hot events based on topic model

A hot event and topic model technology, applied in special data processing applications, instruments, electrical digital data processing, etc., can solve problems such as weakening, no original data optimization processing, etc., and achieve the effect of eliminating interference

Inactive Publication Date: 2011-12-21
ZHEJIANG UNIV
View PDF2 Cites 29 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] Although this method can detect certain network emergencies, there are still some defects: 1) clustering is not the best topic modeling method, and topic model is more suitable for topic mining; 2) When generating candidate topics, this method does not optimize the original data according to the burst characteristics of time series data, resulting in more time-independent static topics in candidate topics
On the other hand, the theme of emergencies we care about will also be weakened by the interference of static themes

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Detection method of network sudden hot events based on topic model
  • Detection method of network sudden hot events based on topic model
  • Detection method of network sudden hot events based on topic model

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0074] The experimental data uses the Twitter Weibo data set. The data set includes a total of 281,734 documents and 22,063 words from April 13, 2011 to May 11, 2011. The experimental parameters are selected as shown in the following table:

[0075]

[0076] Since Twitter documents are short and the number of documents is very large, the document distribution filter coefficient we set is relatively low. If you are experimenting on a long-form news report data set, you should choose a larger filter coefficient. After screening, 290 feature words and 11768 feature documents were obtained.

[0077] Then use the probability-based latent semantic analysis model (PLSA) for topic modeling. The initial number of topics is set to 50. After event screening and repetitive processing, a total of 15 network emergent hot events are obtained, of which the number of feature words for 3 events is 2. The rest have only one feature word. This is because the Twitter document is short. The data set use...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a network burst hotspot event detection method based on a topic model, which comprises the following steps of: 1, firstly, carrying out participle treatment on a file data set to obtain a word list, a file word relation matrix, a word file distribution matrix and a word date distribution matrix; 2, screening the data set according to relevant words in an emerging process of network hotspot events and burst characteristics of a file; 3, obtaining characteristic words and characteristic texts of the burst hotspot events through topic modeling; and 4, figuring out attention date distribution of the hotspot events. Compared with the prior art, the invention has the advantages that the topic modeling is carried out by using the topic model, thus a topic event can be more accurately described; and a burst characteristic computing method of words is introduced and then the data set is screened, thus time-unrelated topics are removed through filtering, and an actual burst hotspot event is obtained.

Description

Technical field [0001] The invention relates to the field of topic models and event detection, and in particular to a method for detecting network hotspot events based on topic models. Background technique [0002] With the rapid development and wide application of network technology, the Internet has gradually become an important channel for people to obtain information. There are hundreds of millions of network information emerging worldwide every day. How to detect sudden hot events in massive network information has become An emerging research topic. [0003] Traditional topic models, such as PLSA (Probabilistic Latent Semantic Analysis), LDA (Latent Dirichlet Allocation), etc., can be used to perform topic mining on a document set. They approximate each topic in the document set through iterative calculations. However, these topic models are based on the BOW (Bag Of Words) model, which only considers the affiliation of words and documents, ignoring the time information of wor...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
Inventor 张寅邵健刘霄吴飞
Owner ZHEJIANG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products