Hot topic detection method of Chinese micro-blogs

A hot topic and detection method technology, which is applied in unstructured text data retrieval, network data retrieval, and other database retrieval, can solve the problems of lack of microblog short text information, unfavorable detection of hot topics, and high algorithm complexity. The effect of fast detection speed, high accuracy and broad application prospects

Active Publication Date: 2014-04-23
FUZHOU UNIVERSITY
View PDF3 Cites 36 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0010] In the existing technology, starting from the modeling point of view, for example, by improving LDA to apply to Weibo's Author-topic and Twitter-LDA models, etc., it can be effectively used for Weibo topic modeling, but the complexity of this type of algorithm Generally high, it is not conducive to detecting hot topics from large

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Hot topic detection method of Chinese micro-blogs
  • Hot topic detection method of Chinese micro-blogs
  • Hot topic detection method of Chinese micro-blogs

Examples

Experimental program
Comparison scheme
Effect test

Example Embodiment

[0045] The present invention will be further described below in conjunction with the drawings and specific embodiments.

[0046] The hot topic detection method of Chinese microblog of the present invention, such as figure 1 As shown, including the following steps:

[0047] Step (1) Filter spam Weibo

[0048] Because there are a lot of noisy microblogs in microblogs, such as some advertising promotion, microblog activities, user personal microblogs, etc., the present invention first filters spam microblogs based on certain spam filtering rules. The spam filtering rule is to filter out Weibo containing one of the following contents:

[0049] a) Special characters: including "★", "▲", "¥", "『", "◆", "●", "①", etc.;

[0050] b) Promotion of relevant special Chinese characters: including "share from", "participated in voting", "event recommendation", etc.;

[0051] c) Web link "http: / / t.cn / ";

[0052] d) The symbol "#".

[0053] Step (2) Preliminarily aggregate the keywords distributed in We...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a hot topic detection method of Chinese micro-blogs. The hot topic detection method includes the following steps of (1) filtering junk micro-blogs based on a certain junk filtering rule at first, (2) preliminarily gathering keywords distributed in the micro-blogs to obtain a primary word set expressing topics, retrieving the most similar first (i)k(/i) micro-blogs of each micro-blog, and then enriching the characteristics of each micro-blog by virtue of the result of retrieval and the result of preliminary keyword gathering to obtain the enriched characteristic vector of the characteristics of each micro-blog, and (3) clustering all of the micro-blogs by an incremental clustering method based on the enriched characteristic vector of the characteristics of each micro-blog to obtain a clustered topic set, next, performing topic popularity calculation on the clustered topic set through a certain topic popularity calculation formula, and finally, obtaining a hot topic list. The method is capable of performing hot topic detection on the Chinese micro-blogs efficiently and accurately, and also high in detection speed, high in accuracy, wide in application range and high in applicability.

Description

Technical field [0001] The present invention relates to the technical field of topic detection and tracking, and more specifically, to a hot topic detection method for Chinese microblogs, which can be applied to hot topic detection and popularity ranking, and is suitable for Chinese microblogs, including Sina Weibo and Tencent Weibo. Blog, Netease Weibo, etc. Background technique [0002] The Topic Detection and Tracking (TDT) task started in 1996. A topic contains a series of events or activities, or accompanying directly related events and activities. A TDT event represents what happened at a specific time and occasion, together with all necessary preconditions and unavoidable consequences. [0003] After more than ten years of rapid development in topic detection and tracking, a series of mature theories have been developed, including hidden Markov models, aging theory, time series analysis, LDA, etc. [0004] Hot topics are topics that frequently appear in a period of time. A ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
CPCG06F16/35G06F16/951
Inventor 陈国龙廖祥文郭德清郭文忠魏晶晶
Owner FUZHOU UNIVERSITY
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products