Hot topic detection method based on weighted LDA and improved Single-Pass clustering algorithm

A technology of hot topics and detection methods, applied in text database clustering/classification, computing, digital data information retrieval, etc., can solve problems such as difficult to deal with massive information processing, redundant and complicated messages, etc., to improve clustering effect, enhance Differentiability, the effect of algorithmic complexity reduction

Active Publication Date: 2019-10-01
SICHUAN UNIV
View PDF7 Cites 15 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] However, due to the redundancy and complexity of news in the network, it is difficult to deal with the massive information in the network and respond to sensitive topics in a timely manner only by manually searching for news topics.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Hot topic detection method based on weighted LDA and improved Single-Pass clustering algorithm
  • Hot topic detection method based on weighted LDA and improved Single-Pass clustering algorithm
  • Hot topic detection method based on weighted LDA and improved Single-Pass clustering algorithm

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0051] The present invention will be described in further detail below in conjunction with the accompanying drawings and specific embodiments.

[0052] like figure 1 As shown, the input of the method of the present invention is Chinese text, and the output is hot topics (including ranked topic words and topic cluster representative documents). First, preprocess the text data, including word segmentation, stop word filtering, feature word weighting, etc., and then use the LDA topic model to model it and filter and denoise the vectorized text; then based on the improved Single-Pass algorithm. The text after dimensionality reduction is clustered; finally, the hot topic in the topic cluster is identified by the hot topic detection method, and the hot topic is displayed by using the topic word ranking algorithm and the document distance calculation formula. The details are as follows:

[0053] Step 1: Text preprocessing; the text preprocessing of the present invention includes se...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a hot topic detection method based on weighted LDA and improved Single-Pass clustering algorithm. The hot topic detection method of the Pass clustering algorithm comprises thefollowing steps: preprocessing text data, including Chinese word segmentation, stop word removal and feature word weighting; modeling the text data by utilizing a weighted LDA topic model, realizing feature dimension reduction by mining hidden topic information in the text data, and filtering and denoising a vectorized result; subjecting text vectorization result processed by LDA topic model weighted by feature words to improved Single-Pass clustering algorithm to carry out clustering; and calculating a hot value of the topic cluster by utilizing the topic cluster scale and the topic cluster compactness, and identifying the hot topic. The detection method has the advantages of being low in algorithm complexity, low in dependency on text input time sequence and the like.

Description

technical field [0001] The invention relates to the technical field of hot topic detection, in particular to a hot topic detection method based on a feature word weighted Latent Dirichlet Allocation (LDA) topic model and an improved Single-Pass clustering algorithm. Background technique [0002] A hot topic is a period of time when relevant news reports and Weibo information surrounding a certain event are discussed and shared by a large number of users, causing the event to receive widespread attention and eventually becoming a topic focus within the entire network. Hot topic detection is one of the important tasks in public opinion monitoring and guidance. It processes massive real-time data in a timely and effective manner, mines the topic structure in text data, and displays the current topic focus and related content of users on the Internet. , to provide a convenient and accurate reference for public opinion monitors and ordinary users to grasp the development trend of...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/9536G06F16/35
CPCG06F16/35G06F16/9536
Inventor 陈兴蜀蒋术语王海舟王文贤殷明勇唐瑞蒋梦婷李敏毓
Owner SICHUAN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products