Adaptive iteration Gibbs sampling method suitable for LDA topic model

An adaptive iteration and topic model technology, applied in natural language data processing, instrumentation, electrical digital data processing, etc., can solve the problem of inaccurate setting of iteration number and iteration number, and achieve the effect of reducing time and improving efficiency

Pending Publication Date: 2022-01-14
KUNMING UNIV OF SCI & TECH
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] The technical problem to be solved by the present invention is an adaptive iterative Gibbs sampling method suitable for LDA topic models, so as to make up for the need to set the number of iterations in advance and the number of iterations to be set inaccurately when the existing LDA topic model is trained using Gibbs sampling defect

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Adaptive iteration Gibbs sampling method suitable for LDA topic model
  • Adaptive iteration Gibbs sampling method suitable for LDA topic model
  • Adaptive iteration Gibbs sampling method suitable for LDA topic model

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0028] Example 1: This example uses 322 news texts downloaded from the metallurgical information website from November 1, 2016 to November 30, 2016, and according to the data processing steps in the technical plan, input them into the adaptive iterative Gibbs sampling algorithm Feature extraction is performed in the LDA topic model.

[0029] Such as figure 1 as shown,

[0030] Step1: Preprocess the input text dataset.

[0031] The metallurgical news data set is segmented using python open source word segmentation software jieba0.42 version. And use the Harbin Institute of Technology stop word list to remove low-frequency words and unimportant words in the metallurgical news data set, such as "de", "land", "get", and punctuation such as "?", "!" symbol.

[0032] Step2: Generate a word bag model.

[0033] The metallurgical news data set preprocessed by Step 1 is converted into a bag of words model through the python open source Gensim toolkit, that is, the word frequency ma...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to an adaptive iteration Gibbs sampling method suitable for an LDA topic model, and belongs to the technical field of computer and improved algorithm optimization. The method comprises the following steps: firstly, carrying out word segmentation and stop word removal processing on an input text data set; converting the preprocessed text data set into a bag-of-word model; inputting the word bag into the LDA topic model, and performing parameter estimation by using an adaptive iterative Gibbs sampling algorithm; and when the Gibbs sampling iteration is automatically finished, outputting potential topic features of the text data set. According to the adaptive iterative Gibbs sampling algorithm, manual setting of the number of iterations is not needed when parameter estimation of the LDA topic model is carried out, the number of iterations is greatly reduced, and the efficiency of generating topic features by the LDA topic model is improved.

Description

technical field [0001] The invention relates to an adaptive iterative Gibbs sampling method suitable for an LDA topic model, and belongs to the technical field of computer and improved algorithm optimization. Background technique [0002] As one of the most popular topic models, the LDA topic model can give the topic of each text in the text dataset in the form of a probability distribution, so that after extracting their topic distribution, they can perform topic clustering or Text Categorization. The LDA topic model usually uses the Gibbs sampling algorithm to approximate the topic parameters. However, the Gibbs sampling algorithm needs to set the number of iterations in advance. In practical applications, if the number of iterations is not enough, the estimated parameters will be poor. But how many iterations does it take to converge in a practical application? There is no good way. Under normal circumstances, the number of iterations of the Gibbs sampling algorithm is...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F40/289G06F40/216
CPCG06F40/289G06F40/216
Inventor 邵党国李承瑶
Owner KUNMING UNIV OF SCI & TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products