Microblog topic detection method and system based on incremental clustering algorithm

A technology of incremental clustering and topic detection, applied in text database clustering/classification, computing, unstructured text data retrieval, etc., can solve problems such as slow processing speed and ineffective hot topic discovery in time

Inactive Publication Date: 2017-10-24
GUANGXI UNIVERSITY OF TECHNOLOGY +1
View PDF4 Cites 16 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The disadvantage of this invention is that the invention needs to construct microblog text clues to form a microblog text forest, and needs to analyze a large number of microblog topics to form a microblog topic database. Such an invention will have very good effects in specific fields. Obviously, but the processing speed in the explosive mobile Internet of Weibo will be relatively slow, and the effect of discovering timely hot topics may not be obvious
The disadvantage of this invention is that the main

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Microblog topic detection method and system based on incremental clustering algorithm
  • Microblog topic detection method and system based on incremental clustering algorithm
  • Microblog topic detection method and system based on incremental clustering algorithm

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0084] A microblog topic detection method based on incremental clustering algorithm, such as figure 1 .2, said method comprises the following steps:

[0085] S1, for the convenience of expression, the present invention obtains a set of microblog information collections from the Internet, as shown in Table 1:

[0086] Table 1

[0087]

[0088] S2, preprocessing the microblog information collection:

[0089] (1) Delete user information whose number of listeners is less than the threshold F; the seventh microblog is excluded from pending microblogs because the number of followers is less than 30;

[0090] (2) Neglecting directional dialog interaction information; in this embodiment, the seventh microblog is excluded from microblogs to be processed because it contains “@user”.

[0091] (3) Segment the microblog text and keep verbs, nouns and adjectives. This method is based on the ICTCLAS2013 version of the Chinese word segmentation system of the Chinese Academy of Sciences...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a microblog topic detection method and system based on an incremental clustering algorithm; the method comprises the steps of S1, acquiring a microblog information set; S2, preprocessing the microblog information set; S3, after preprocessing, extracting characteristic words according to word occurrence frequencies, word distribution in microblog text, and word distribution in a time window; S4, weighting the characteristic words, and vectorizing the characteristic words and their weights; S5, combining topics by means of a similarity judging method based on inter-vector spacing. The microblog topic detection method and system based on the incremental clustering algorithm have good effects in terms of recall rate, accuracy and the like, with operating speed greatly higher than that of the k-means method.

Description

technical field [0001] The invention belongs to the technical field of topic detection, and in particular relates to a microblog topic detection method and system based on an incremental clustering algorithm. Background technique [0002] With the development of Internet technology and the rapid growth of its applications, especially after the rise of web2. Netizens' attention and love. Weibo is a platform for information sharing, dissemination and acquisition based on user relationships. It can share and disseminate information in real time through the Internet, mobile Internet or some clients. Weibo publishes news with a maximum information volume of 140 characters, accompanied by pictures, sounds, and video files to provide users with rich and diversified information sharing and dissemination. At present, microblog has become an important platform for netizens to express their various emotions, especially in today's state of increasing efforts to crack down on Internet ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30G06Q50/00
CPCG06F16/35G06Q50/01
Inventor 王萌王晓荣梁伟鄯
Owner GUANGXI UNIVERSITY OF TECHNOLOGY
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products