Unlock instant, AI-driven research and patent intelligence for your innovation.

Large-Scale Text Classification Method Based on Deep Topic Model

A topic model and text classification technology, which is applied in the field of text processing, can solve problems such as complex gradient calculations and unadjustable gradient update steps, and achieve the effects of simplifying model training, enhancing practicability, and accelerating convergence speed

Active Publication Date: 2019-12-27
XIDIAN UNIV
View PDF2 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

At present, the stochastic gradient method is mostly applied to the shallow model. For the deep model, the stochastic gradient method is rarely applied, not only because there is a large correlation between the layers of the deep model, the gradient calculation is very complicated; and in During the training process of the deep topic model trained by the stochastic gradient method, the gradient update step size between layers cannot be adjusted

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Large-Scale Text Classification Method Based on Deep Topic Model
  • Large-Scale Text Classification Method Based on Deep Topic Model
  • Large-Scale Text Classification Method Based on Deep Topic Model

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0036] refer to figure 1 , the specific implementation steps of the present invention are as follows:

[0037] Step 1, construct the training set and test set of digital information.

[0038] Randomly select a training text set and a test text set from the text corpus;

[0039] The format of the training text set and the test text set is converted from text information into a training set and a test set of digital information by using the bag of words method;

[0040] The bag of words method is a method commonly used in the field of text processing to convert text information into digital information. For details, refer to the web page https: / / en.wikipedia.org / wiki / Bag-of-words_model.

[0041]Step 2, initialize the global parameters of the Poisson-Gamma belief network, hidden variable parameters and other network parameters.

[0042] 2.1) Set the maximum number of layers of the Poisson-Gamma belief network to 2, the number of samples in a single mini-block dataset to 200, a...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a text analysis method based on a deep topic model, which mainly solves the problems in the prior art that the gradient cannot be solved and the gradient update steps of each layer cannot be adjusted. The main steps are: 1. Establish a training set and a test set; 2. Set the Poisson-Gamma confidence network and its parameters, and use the training set in step 1 to perform gradient update training on the network; 3. Save the trained network overall Parameters; 4. Use the global parameter values ​​saved in step 3 to initialize the global parameters of the test network, and use the test set in step 1 to train the test network; 5. Save the network parameters of the test network; 6. Use the test network parameters obtained in step 5 Classify the text and output the predicted text category and text classification accuracy. The invention can extract multi-layer information of text information, solves the problems that the gradient is difficult to solve and the step length of each layer cannot be adjusted, and can be used for text information extraction and text classification.

Description

technical field [0001] The invention belongs to the technical field of text processing, and in particular relates to a large-scale text classification method of a deep topic model based on adaptive step size stochastic gradient training, which can be used for text data mining and classification. Background technique [0002] At present, the topic models used in the field of natural language processing are not only more and more, but also more and more mature, but there are differences in training methods and practicability. Although some shallow models are more practical, But it is not as good as the deep model in performance. For example, the commonly used LDA topic model is one of them. Although the performance of the deep topic model is improved compared with the shallow topic model, because of its complicated training, the training time is too long, especially When the data set is particularly large, it is lacking in practicability, such as the Poisson-Gamma Belief Netwo...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/35G06K9/62
CPCG06F16/35G06F18/2411
Inventor 陈渤李千勇丛玉来郭丹丹
Owner XIDIAN UNIV