Chinese text classification method based on super-deep convolution neural network structure model

A text classification and convolutional neural technology, applied in the fields of natural language processing and deep learning, can solve problems such as insufficient accuracy, difficulty in determining the size of the convolution kernel, and high vector dimensions

Inactive Publication Date: 2017-10-27
HEBEI UNIV OF TECH
View PDF4 Cites 81 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] In view of the deficiencies in the prior art, the technical problem to be solved by the present invention is to provide a Chinese text classification method based on an ultra-deep convolutional neural network structure (VDCNN for short) model, which solves the problem of excessive vector dimension in Chinese text classification. It is difficult to determine the size of the convolution kernel, the gradient disappears, and the accuracy rate is insufficient in the traditional convolutional neural network.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Chinese text classification method based on super-deep convolution neural network structure model
  • Chinese text classification method based on super-deep convolution neural network structure model
  • Chinese text classification method based on super-deep convolution neural network structure model

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0033] The present embodiment is based on the Chinese text classification method of ultra-deep convolutional neural network structure model, and this method comprises the following steps:

[0034] Step 1: Collect the training corpus of word vectors from the Internet, use the jieba word segmentation tool to segment the training corpus, remove stop words at the same time, build a dictionary D, and then use the Skip-gram model training in the Word2Vec tool to get each word in the dictionary Corresponding word vector; The Skip-gram model (see figure 1 ) is to predict the words in the context definition Context(w) of the current word w(t) under the premise of knowing the current word w(t). The Skip-gram model consists of three layers: input layer, projection layer and output Floor;

[0035] The input layer (INPUT) input is the current word w(t), the projection layer (PROJECTION) is the identity projection of the input layer to deal with the projection layer in the CBOW model, and ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a Chinese text classification method based on a super-deep convolution neural network structure model. The method comprises the steps of collecting a training corpus of a word vector from the internet, combining a Chinese word segmentation algorithm to conduct word segmentation on the training corpus, and obtaining a word vector model; collecting news of multiple Chinese news websites from the internet, and marking the category of the news as a corpus set for text classification, wherein the corpus set is divided into a training set corpus and a test set corpus; conducting word segmentation on the training set corpus and the test set corpus respectively, and then obtaining the word vectors corresponding to the training set corpus and the test set corpus respectively by utilizing the word vector model; establishing the super-deep convolution neural network structure model; inputting the word vector corresponding to the training set corpus into the super-deep convolution neural network structure model, and conducting training and obtaining a text classification model; inputting the Chinese text which needs to be sorted into the word vector model, obtaining the word vector of the Chinese text which needs to be classified, and then inputting the word vector into the text classification model to complete the Chinese text classification.

Description

technical field [0001] The invention relates to the technical fields of natural language processing and deep learning, in particular to a Chinese text classification method based on an ultra-deep convolutional neural network structure model. Background technique [0002] With the explosive growth of network platforms such as mobile Internet, social networking and new media, the network is filled with a large number of texts that lack effective information organization but have research value, and text classification, as one of the key technologies of natural language processing, can effectively solve the problem of information It is widely used in tasks such as search engines, spam filtering, personalized news and data sorting. Therefore, text classification plays an important role in natural language processing, intelligent organization and management of data and other fields. [0003] Traditional text classification mainly relies on knowledge engineering taxonomy. First, ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30G06F17/27
CPCG06F16/353G06F40/289
Inventor 彭玉青宋初柏闫倩赵晓松魏铭
Owner HEBEI UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products