LDA fusion model and multilayer clustering-based news topic detection method

A technology of fusion model and topic detection, applied in special data processing applications, instruments, electronic digital data processing, etc., to improve the effect of clustering and improve the quality of clustering

Inactive Publication Date: 2017-12-01
TIANJIN UNIV
View PDF4 Cites 34 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] In order to overcome the deficiencies in the prior art, the present invention aims to propose a news topic detection method based on the LDA fusion model and multi-layer clustering, aiming at the defects in the semantics of the TF-IDF vector space algorithm, and the time complexity and complexity of the text hierarchical clustering Defects in accuracy, improve feature extraction, representation modeling, similarity calculation and fast and accurate text clustering methods for a large number of news texts

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • LDA fusion model and multilayer clustering-based news topic detection method
  • LDA fusion model and multilayer clustering-based news topic detection method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0031] The present invention proposes a method for news topic detection based on LDA fusion model and multi-layer clustering, comprising the following steps:

[0032] Step 1: Use VSM to build a similarity model. Each dimension of the VSM model represents the weight vector of the corresponding word, for two vectors d 1 、d 2, use the cosine similarity calculation method to calculate the similarity between them. The more the cosine value tends to 1, the larger the angle between the two vectors; the cosine value tends to 0, which means that the directions of the two vectors are more consistent and the similarity is higher.

[0033] Step 2: Use LDA to build a topic model. Gibbs sampling is a method to generate a Markov chain. The Gibbs method is used for sampling, and the parameters of the model are calculated. The construction of the Markov chain is realized by iterating the sample value, and the final Convergence is achieved and accurate parameter settings are finally obtaine...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention belongs to the field of data mining, natural language processing and information retrieval, and provides a news topic detection method. For the defect of a TF-IDF-based vector space algorithm in semantics and the defects of time complexity and accuracy of textual level clustering, feature extraction, representation modeling, similarity calculation and quick and accurate text clustering methods for a large amount of news texts are improved. The LDA fusion model and multilayer clustering-based news topic detection method comprises the following steps of 1: building a similarity model by using a vector space model (VSM); 2: finally obtaining accurate parameter settings; 3: organically fusing two text models; 4: judging whether a topic is a new topic or not; 5: calculating the similarity until all documents are clustered; and 6: adding an ISP&AH clustering algorithm of AHC based on the step 5. The method is mainly applied to the design and manufacturing occasions.

Description

technical field [0001] The invention belongs to the fields of data mining, natural language processing and information retrieval, and relates to monitoring technology and network information filtering technology, especially text analysis and topic detection methods. Specifically, it involves a news topic detection method based on latent Dirichlet Allocation (LDA) fusion model and multi-layer clustering. Background technique [0002] Topic Detection and Tracking (TDT) evolved from Event Detection and Tracking (EDT) in the early years. It is an automatic content analysis of news reports without human intervention. Techniques for identifying, mining, and organizing taxonomies. The vector space model (Vector SpaceModel, VSM) based on Term Frequency–Inverse Document Frequency (TF-IDF) has shown powerful capabilities in text representation. The vector space model is an algebraic model used to represent text files. It applies to information filtering, information retrieval, inde...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30G06F17/27
CPCG06F16/355G06F40/216G06F40/30
Inventor 喻梅安永利于健于瑞国赵满坤谢晓东
Owner TIANJIN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products