Eureka AIR delivers breakthrough ideas for toughest innovation challenges, trusted by R&D personnel around the world.

Text classification method and system based on deep learning of hybrid automatic encoder

An autoencoder and hybrid automatic technology, applied in text database clustering/classification, unstructured text data retrieval, instrumentation, etc., can solve high-dimensional and sparse classification time, high training and classification time overhead, Reduce classification accuracy and other issues, to achieve the effect of text classification, ideal feature learning effect, and improve classification accuracy

Inactive Publication Date: 2018-02-23
QILU UNIV OF TECH
View PDF3 Cites 11 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] 1. It brings a lot of overhead in training and classification time;
[0007] 2. Too many features often lead to the problem of "dimension disaster" that people often say. High-dimensional problems lead to inaccurate extracted features and reduce the accuracy of classification.
Although the accuracy of text classification is improved, it is still unable to solve the problems of long classification time and low accuracy caused by the high dimensionality and sparseness of massive data features in text classification.
[0009] To sum up, there is still a lack of effective solutions to the problems of long classification time and low accuracy caused by the high dimensionality and sparseness of massive data features in text classification in the prior art.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Text classification method and system based on deep learning of hybrid automatic encoder
  • Text classification method and system based on deep learning of hybrid automatic encoder
  • Text classification method and system based on deep learning of hybrid automatic encoder

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0073] The purpose of Embodiment 1 is to provide a text classification method based on hybrid autoencoder deep learning. Specifically, it is a method that combines the sparse restricted Boltzmann machine SRBM and the shrinkage autoencoder CAE to form a hybrid autoencoder training model, which combines the robust feature extraction advantages of the shrinkage autoencoder CAE and the sparse restricted glass The feature representation of the SRBM sparsity of the Ertzmann machine is combined with the advantages of fast learning using contrastive divergence to enhance the learning ability of the hybrid autoencoder. The unsupervised layer-by-layer greedy learning algorithm is used to train the model, and Polyak Averaging is added to speed up the parameters when updating parameters. Convergence speed, the backpropagation BP algorithm fine-tunes the model, and finally through the support vector machine SVM classification, the classification accuracy requirements for the classification ...

Embodiment 2

[0123] The purpose of Embodiment 2 is to provide a computer-readable storage medium.

[0124] In order to achieve the above object, the present invention adopts the following technical scheme:

[0125] A computer-readable storage medium, in which a plurality of instructions are stored, and the instructions are adapted to be loaded by a processor of a terminal device and perform the following processing:

[0126] Obtain text data and perform preprocessing;

[0127] The preprocessed text data is based on the hybrid autoencoder training model for feature learning, and the hybrid autoencoder training model is formed by adding the sparse restricted Boltzmann machine SRBM to the shrinkage autoencoder CAE network;

[0128] Classify the text data after feature learning.

[0129] In this embodiment, examples of the computer-readable recording medium include magnetic storage media (for example, ROM, RAM, USB, floppy disk, hard disk, etc.), optical recording media (for example, CD-ROM ...

Embodiment 3

[0131] The purpose of Embodiment 3 is to provide a terminal device.

[0132] In order to achieve the above object, the present invention adopts the following technical scheme:

[0133] A terminal device, including a processor and a computer-readable storage medium, the processor is used to implement instructions; the computer-readable storage medium is used to store multiple instructions, and the instructions are suitable for being loaded by the processor and performing the following processing:

[0134] Obtain text data and perform preprocessing;

[0135] The preprocessed text data is based on the hybrid autoencoder training model for feature learning, and the hybrid autoencoder training model is formed by adding the sparse restricted Boltzmann machine SRBM to the shrinkage autoencoder CAE network;

[0136] Classify the text data after feature learning.

[0137] Beneficial effects of the present invention:

[0138] 1. In the text classification method and system based on h...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a text classification method and system based on deep learning of a hybrid automatic encoder. The method is a method for combining a sparse restricted Boltzmann machine (SRBM)with a contraction automatic encoder (CAE) to form a hybrid automatic encoder training model, the advantage of feature extraction of the robustness of the contraction automatic encoder (CAE) and theadvantages of feature representation and contrast divergence fast learning of the sparsity of the sparse restricted Boltzmann machine (SRBM) are combined, the learning capability of the hybrid automatic encoder is improved, and the dimension of the feature space is reduced; an unsupervised layer-by-layer greedy learning algorithm is used for training the model, the parameter convergence rate is increased by adding Polyak Averaging when parameters are updated, and fine tuning is conducted on the model by means of a back propagation (BP) algorithm; finally, a support vector machine (SVM) is adopted for classification, the text feature dimension is reduced, and the accuracy of the text classification is improved.

Description

technical field [0001] The invention belongs to the technical field of data classification processing, and in particular relates to a text classification method and system based on hybrid autoencoder deep learning. Background technique [0002] With the rapid development of network technology, massive information resources exist in the form of text. People urgently hope to quickly and effectively find the content they are interested in from the explosive wave of information. As an important research direction of information processing, text classification is a common method to solve text information discovery. However, for massive data, the high dimensionality of features brings many problems to text classification, which cannot meet people's needs for acquiring useful knowledge. [0003] Deep learning is an unsupervised feature learning and feature hierarchy learning method. The unsupervised learning method is generally a feature learning method that realizes feature extr...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F16/35
Inventor 杨振宇靖慧
Owner QILU UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Eureka Blog
Learn More
PatSnap group products