Deep theme model-based large-scale text classification method

A topic model and text classification technology, applied in the field of text processing, can solve the problems of complex gradient calculation and inability to adjust the gradient update step size, and achieve the effect of simplifying model training, enhancing practicability, and accelerating convergence speed.

Active Publication Date: 2017-04-26
XIDIAN UNIV
View PDF2 Cites 7 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

At present, the stochastic gradient method is mostly applied to the shallow model. For the deep model, the stochastic gradient method is rarely applied, not only because there is a large correlation between the layers of the deep model, the gradient calculation is very complicated; and in During the training process of the deep topic model trained by the stochastic gradient method, the gradient update step size between layers cannot be adjusted

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Deep theme model-based large-scale text classification method
  • Deep theme model-based large-scale text classification method
  • Deep theme model-based large-scale text classification method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0036] refer to figure 1 , the specific implementation steps of the present invention are as follows:

[0037] Step 1, construct the training set and test set of digital information.

[0038] Randomly select a training text set and a test text set from the text corpus;

[0039] The format of the training text set and the test text set is converted from text information into a training set and a test set of digital information by using the bag of words method;

[0040] The bag of words method is a method commonly used in the field of text processing to convert text information into digital information. For details, refer to the web page https: / / en.wikipedia.org / wiki / Bag-of-words_model.

[0041]Step 2, initialize the global parameters of the Poisson-Gamma belief network, hidden variable parameters and other network parameters.

[0042] 2.1) Set the maximum number of layers of the Poisson-Gamma belief network to 2, the number of samples in a single mini-block dataset to 200, a...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a deep theme model-based large-scale text classification method, and mainly solves the problems that the gradient cannot be solved and the gradient update step length of each layer cannot be adjusted in the prior art. The method mainly comprises the steps of 1, establishing a training set and a test set; 2, setting a Poisson-gamma belief network and network parameters, and performing gradient update training on the network by using the training set in the step 1; 3, storing the trained network global parameters; 4, initializing global parameters of a test network by using the global parameter values stored in the step 3, and training the test network by using the test set in the step 1; 5, storing network parameters of the test network; and 6, classifying texts by utilizing the test network parameters obtained in the step 5, and outputting and predicting text types and text classification accuracy. According to the method, multilayer information of text information can be extracted; the problems that the gradient is difficultly solved and the step length of each layer cannot be adjusted are solved; and the method can be used for text information extraction and text classification.

Description

technical field [0001] The invention belongs to the technical field of text processing, and in particular relates to a large-scale text classification method of a deep topic model based on adaptive step size stochastic gradient training, which can be used for text data mining and classification. Background technique [0002] At present, the topic models used in the field of natural language processing are not only more and more, but also more and more mature, but there are differences in training methods and practicability. Although some shallow models are more practical, But it is not as good as the deep model in performance. For example, the commonly used LDA topic model is one of them. Although the performance of the deep topic model is improved compared with the shallow topic model, because of its complicated training, the training time is too long, especially When the data set is particularly large, it is lacking in practicability, such as the Poisson-Gamma Belief Netwo...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30G06K9/62
CPCG06F16/35G06F18/2411
Inventor 陈渤李千勇丛玉来郭丹丹
Owner XIDIAN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products