Topic detection and tracking method based on microblog data

A topic and Weibo technology, applied in the field of topic discovery and tracking based on Weibo data, which can solve the problems of discount of detection results and sparse feature matrix.

Active Publication Date: 2013-11-13
NANJING UNIV OF POSTS & TELECOMM
View PDF2 Cites 57 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

If we use the traditional method of constructing a vocabulary-text feature matrix to analyze topics, some unique properties of the microblog text itself will cause the feature matrix to be highly sparse, and it is conceivable that the detection results will be greatly reduced

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Topic detection and tracking method based on microblog data
  • Topic detection and tracking method based on microblog data
  • Topic detection and tracking method based on microblog data

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0028] The invention will be described in further detail below in conjunction with the accompanying drawings.

[0029] Step 1: Data Preprocessing

[0030] ① Ignore directional dialogue interaction messages. That is to ignore the Weibo information with the "username" format. This type of information is mostly directed to the dialogue between users, and it is often less likely to describe general topics. After removal, noise data can be eliminated as much as possible.

[0031] ②Expansion of original Weibo data. The information in the embedded external link (URL) involved in the microblog text is extracted and added to the microblog information to support the user's viewpoint description. Use the extracted data to calculate the TF-IDF value in the next step.

[0032] ③ Stylization of Weibo text. In order to normalize the microblog data, the data is preprocessed first. After word segmentation, stop words removal, high and low frequency words processing, and the changed TF-ID...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a topic detection and tracking method based on microblog data. In the method, potential hidden subjects in large-scale social network information are mined. The method comprises the following steps: firstly, partitioning microblog data increasing massively according to time sequence properties, and filtering redundant information; secondly, analyzing and classifying text contents in time windows, returning key subject descriptions with independent semantics after extraction, and extracting topics in different time windows; and lastly, analyzing the inheritance and the identity of topics among the time windows to conclude the variation tendency of microblog topics. According to the method, the dynamic developing process of topic contents, namely, the generation, development, climax and extinction of topics can be shown, and topics are described more accurately and fully.

Description

technical field [0001] The invention relates to the technical field of data mining, in particular to a topic discovery and tracking method based on microblog data. Background technique [0002] With the rapid development of Web 2.0 and the advancement of information dissemination means, in recent years, Weibo has grown into a rapidly developing and highly influential network media form for the whole people. As a new information carrier and dissemination channel, Weibo enables netizens to comment on various products and services more conveniently, participate in discussions on various hot topics, and plays an increasingly important role in the initiation and dissemination of Internet public opinion information. important role. The real-time large-scale growth of microblog information is not all valuable to users. It is necessary to automatically extract hot topics that can interest users from massive microblog information, and filter out redundant data that has no practical ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
Inventor 孙国梓黄斯琪杨一涛陈国兰仇呈燕郑冬亚
Owner NANJING UNIV OF POSTS & TELECOMM
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products