A hot topic discovery method based on btm and single-pass

A technology of hot topics and discovery methods, applied in the field of text clustering, can solve the problems of a single host, time-consuming and laborious processing of processors, not considering the characteristics of microblog streaming data, and the classification effect needs to be improved, so as to maintain the quality of topic discovery, Improve data processing efficiency and improve the effect of calculation and analysis

Active Publication Date: 2021-02-09
HOHAI UNIV
View PDF4 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

First of all, the amount of data to be processed in the process of hot topic discovery is huge, and it takes time and effort for a single host and processor to process
Secondly, the pure BTM model is too slow for topic mining of data. Finally, after using the BTM topic model for modeling, the characteristics of Weibo streaming data are not considered, so its classification effect needs to be improved

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A hot topic discovery method based on btm and single-pass
  • A hot topic discovery method based on btm and single-pass
  • A hot topic discovery method based on btm and single-pass

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0043] The present invention provides a method for discovering hot topics based on BTM and Single-pass, which is suitable for short texts and stream data with sparse data. The main steps of this method include: (1) using the improved Single-pass algorithm for cluster analysis; (2) performing MapReduce distributed parallelization based on BTM and Single-pass hot topic discovery methods deal with.

[0044] (1) Improved Single-pass algorithm for cluster analysis

[0045] Such as figure 1 , the data set D is divided into multiple data slices according to a certain scale, that is, D 1 ,D 2 ,...,D n , input the decomposed data set in sequence.

[0046] 1) For D 1 ,D 2 ,...,D n These data slices are used as input data in sequence; each part performs internal clustering first, and its clustering method is similar to the classic Single-pass algorithm, and the respective clustering results of each part can be obtained;

[0047] 2) Choose D 1 This part is used as the first part...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The present invention provides a hot topic discovery method based on BTM and Single-pass. The method first uses the BTM topic model to carry out topic modeling, obtains the topic distribution of the corpus data set, then uses VSM to vectorize, and then uses the improved The Single-pass algorithm clusters the results obtained above, sorts out the clustering results to obtain new clustering results, and secondly, parallelizes the above-mentioned hot topic discovery method to improve its performance in the case of large amounts of data. The speed of topic mining. The present invention can well solve the problems of microblog data sparseness and the ability to process massive data, and the improved Single-pass algorithm can well reduce the computational complexity, maintain the stability of the algorithm, and effectively process new data It has a good calculation and analysis of the continuous influence of hot topics, and the data set can still maintain the quality of topic discovery on the basis of improving data processing efficiency through the MapReduce framework.

Description

technical field [0001] The invention relates to a hot topic discovery method based on BTM and Single-pass, belonging to text clustering in the field of data mining. Background technique [0002] With the popularity of smart phones and the Internet, people can always pay attention to the latest major events in some countries and societies through the Weibo APP. Discovering and researching hot topics on Weibo is of great value in the fields of business and scientific research. More and more scholars are doing relevant research on Weibo. [0003] In traditional hot topic discovery, algorithms such as LDA topic model and K-Means are generally used for research. However, the traditional LDA model mainly solves the problem of long text, and the processing effect on short text data such as Weibo At the same time, microblog data has the characteristics of sparse data and strong contextual relevance, which is difficult for the LDA model to solve. [0004] In order to handle massive...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/35
CPCG06F16/353
Inventor 许国艳夭荣朋张网娟平萍朱帅李敏佳
Owner HOHAI UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products