Method for detecting burst topic in user generation text stream based on graph clustering

A technology of topic detection and text flow, applied in special data processing applications, instruments, electrical digital data processing, etc., can solve problems such as not being able to distinguish between edges and edges well

Active Publication Date: 2011-10-12
TSINGHUA UNIV
View PDF4 Cites 26 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, the community structure detection method is still insufficient to solve the clustering problem of sudden words.
First of all, this method uses the co-occurrence number of two graph vertices to measure the correlation between vertices, and this non-normalized measure cannot distinguish between the same topic burst words and Linking Edges B

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for detecting burst topic in user generation text stream based on graph clustering
  • Method for detecting burst topic in user generation text stream based on graph clustering
  • Method for detecting burst topic in user generation text stream based on graph clustering

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0028] A method for detecting sudden topics in user-generated text streams based on graph clustering proposed by the present invention is described in detail in conjunction with the accompanying drawings and embodiments as follows:

[0029] The sudden topic detection method of the present invention, such as figure 1 shown, including the following steps:

[0030] 1) Obtaining user-generated documents: First, collect a large number of documents in web format from Web 2.0 sites (such as blog logs, microblogs, etc.; these documents are documents in web format that are generated by Web 2.0 users and have time stamps); The text of the document is extracted from the document in the web format as the processed document, and the publication time of the document is extracted at the same time, and saved;

[0031] 2) Construct text flow: set the time unit (such as hour, day, week), and set the size of the detection time window as a time unit; divide the processed document according to the ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a method for detecting a burst topic in a user generation text stream based on graph clustering and belongs to the technical field of internet data mining. By the method, a graph-based new field of view relative to the conventional topic detection problem is provided, and the detection problem of the burst topic in the text stream is converted into a typical graph clustering problem, so the problem can be solved by using the conventional graph theory method. The method comprises the following main steps of: acquiring the text stream; detecting the burse topic; constructing a burst word graph; and clustering burst words. The method aims at the detection of the burst topic in the user generation text stream and has the performance which is superior to that of the conventional method based on document clustering, a probability topic model and burst characteristic clustering.

Description

technical field [0001] The invention belongs to the technical field of Internet data mining, in particular to a method for detecting sudden topics in text streams. Background technique [0002] Accurate detection of emergent topics from massive user-generated texts is of great significance for government decision-making and commercial promotion. A breaking topic can be a popular event that happens at any moment, or it can be a network activity spontaneously and widely responded to by bloggers within a period of time. Events can be unpredictable, like a volcanic eruption, or predictable, like a presidential election. Activities are generally unpredictable, such as the spread of blog quizzes (Internet quiz). A sudden topic often has a short duration and is heatedly discussed by a large number of netizens. [0003] However, due to limitations such as weak modeling and inflexible parameter settings, existing text clustering, probabilistic topic models, and bursty feature extr...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
Inventor 赵丽管晓宏袁睿翕
Owner TSINGHUA UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products