Deep theme model-based large-scale text classification method

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A topic model and text classification technology, applied in the field of text processing, can solve the problems of complex gradient calculation and inability to adjust the gradient update step size, and achieve the effect of simplifying model training, enhancing practicability, and accelerating convergence speed.

Active Publication Date: 2017-04-26

XIDIAN UNIV

View PDF2 Cites 7 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

At present, the stochastic gradient method is mostly applied to the shallow model. For the deep model, the stochastic gradient method is rarely applied, not only because there is a large correlation between the layers of the deep model, the gradient calculation is very complicated; and in During the training process of the deep topic model trained by the stochastic gradient method, the gradient update step size between layers cannot be adjusted

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0036] refer to figure 1 , the specific implementation steps of the present invention are as follows:

[0037] Step 1, construct the training set and test set of digital information.

[0038] Randomly select a training text set and a test text set from the text corpus;

[0039] The format of the training text set and the test text set is converted from text information into a training set and a test set of digital information by using the bag of words method;

[0040] The bag of words method is a method commonly used in the field of text processing to convert text information into digital information. For details, refer to the web page https: / / en.wikipedia.org / wiki / Bag-of-words_model.

[0041]Step 2, initialize the global parameters of the Poisson-Gamma belief network, hidden variable parameters and other network parameters.

[0042] 2.1) Set the maximum number of layers of the Poisson-Gamma belief network to 2, the number of samples in a single mini-block dataset to 200, a...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a deep theme model-based large-scale text classification method, and mainly solves the problems that the gradient cannot be solved and the gradient update step length of each layer cannot be adjusted in the prior art. The method mainly comprises the steps of 1, establishing a training set and a test set; 2, setting a Poisson-gamma belief network and network parameters, and performing gradient update training on the network by using the training set in the step 1; 3, storing the trained network global parameters; 4, initializing global parameters of a test network by using the global parameter values stored in the step 3, and training the test network by using the test set in the step 1; 5, storing network parameters of the test network; and 6, classifying texts by utilizing the test network parameters obtained in the step 5, and outputting and predicting text types and text classification accuracy. According to the method, multilayer information of text information can be extracted; the problems that the gradient is difficultly solved and the step length of each layer cannot be adjusted are solved; and the method can be used for text information extraction and text classification.

Description

technical field [0001] The invention belongs to the technical field of text processing, and in particular relates to a large-scale text classification method of a deep topic model based on adaptive step size stochastic gradient training, which can be used for text data mining and classification. Background technique [0002] At present, the topic models used in the field of natural language processing are not only more and more, but also more and more mature, but there are differences in training methods and practicability. Although some shallow models are more practical, But it is not as good as the deep model in performance. For example, the commonly used LDA topic model is one of them. Although the performance of the deep topic model is improved compared with the shallow topic model, because of its complicated training, the training time is too long, especially When the data set is particularly large, it is lacking in practicability, such as the Poisson-Gamma Belief Netwo...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): G06F17/30G06K9/62

CPCG06F16/35G06F18/2411

Inventor 陈渤李千勇丛玉来郭丹丹

Owner XIDIAN UNIV

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Deep theme model-based large-scale text classification method

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology