Microblog topic detection method based on improved Single-pass clustering algorithm

A clustering algorithm and topic detection technology, which is applied in text database clustering/classification, computing, network data retrieval, etc., can solve the problems of poor efficiency and accuracy, and achieve the goals of improving efficiency, ensuring identity, and improving computing efficiency Effect

Inactive Publication Date: 2018-03-23
BEIJING UNIV OF TECH
View PDF3 Cites 10 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

At the same time, the existing single-pass topic detection can only be classified one by one, and each new data must be calculated with each piece of data that has been clustered, which is very poor in efficiency and accuracy.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Microblog topic detection method based on improved Single-pass clustering algorithm
  • Microblog topic detection method based on improved Single-pass clustering algorithm
  • Microblog topic detection method based on improved Single-pass clustering algorithm

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0029] In order to make the purpose, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below in conjunction with the drawings in the embodiments of the present invention. Obviously, the described embodiments It is a part of embodiments of the present invention, but not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.

[0030] Below in conjunction with accompanying drawing, the present invention is described in further detail:

[0031] The present invention provides a microblog topic detection method based on the improved Single-pass clustering algorithm, which uses the idea of ​​the LDA topic probability model to carry out text vector modeling on the microb...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a microblog topic detection method based on an improved Single-pass clustering algorithm. The method includes microblog text content collection, text preprocessing, text vectormodel establishment based on LDA (Latent Dirichlet Allocation), text clustering based on the improved Single-pass clustering algorithm, and result evaluation. The improved Single-pass clustering algorithm includes: adding a time parameter, calculating clustering center points for category data, and inputting the data in a batch manner. According to the method and the algorithm, identity of topicsis guaranteed through adding the time parameter; the new data are compared with the clustering center points through calculating the clustering center points for the category data, thus reducing of frequency of comparing the new data with each piece of data is facilitated, and efficiency of calculation is improved; and newly input clustering center points are compared with the center points afterclustering through inputting the data in the batch manner, namely firstly clustering the data and then inputting the same, operation efficiency is improved, and operation space is saved.

Description

technical field [0001] The invention relates to the technical field of topic detection, in particular to a microblog topic detection method based on an improved Single-pass clustering algorithm. Background technique [0002] LDA (Latent Dirichlet Allocation) is a document topic generation model, which contains three layers: words, topics and documents. The generative model is that each word of an article is obtained through a process of "selecting a certain topic with a certain probability, and selecting a certain word from this topic with a certain probability". Express this process with formula (1): [0003] P(word|document)=∑ 主题 P(word|topic)*P(topic|document) (1) [0004] In the LDA model, it is necessary to solve the two model parameters of "word-topic" and "topic-document". The probability map of the probabilistic topic model is as follows: figure 1 shown. [0005] exist figure 1 middle, Denotes a "topic-term" probability model with a multinomial probability di...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30G06F17/27G06Q50/00
CPCG06F16/35G06F16/951G06F40/289G06F40/30G06Q50/01
Inventor 沈琦高云雪
Owner BEIJING UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products