Word vector model based on point mutual information and text classification method based on CNN

A point mutual information, text classification technology, applied in the field of text classification based on CNN, word vector model based on point mutual information, can solve the problems of gradient explosion, neural network without time parameters, tree structure change and so on

Active Publication Date: 2019-01-11
NANJING SILICON INTELLIGENCE TECH CO LTD
View PDF2 Cites 37 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

When the number of m is large, the storage and calculation of the matrix will consume a lot of machine memory and computing time;
[0017] Second, there are difficulties in solving multi-classification problems with support vector machines
[0021] First, the decision tree algorithm is very easy to overfit, resulting in poor generalization ability;
[0022] Second, the decision tree will cause drastic changes in the tree structure due to a small change in the sample;
[0023] Third, for some complex relationships, decision trees are difficult to learn, such as XOR
[0028] Second, if you want to approximate complex functions more accurately, you must increase the number of hidden layers, which is prone to the problem of gradient disappearance or gradient explosion;
[0029] Third, time series data (e.g. audio, text) cannot be processed because neural networks do not contain time parameters
[0038] However, LSTM can only avoid the gradient disappearance of RNN, but cannot resist the gradient explosion problem

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Word vector model based on point mutual information and text classification method based on CNN
  • Word vector model based on point mutual information and text classification method based on CNN
  • Word vector model based on point mutual information and text classification method based on CNN

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0104] The foregoing summary of the invention has described the solutions of the present invention in sufficient detail. The specific implementation of the present invention will be described in detail below with reference to the drawings and specific embodiments, but the implementation of the present invention is not limited thereto. It should be pointed out that if there are any processes or symbols that are not specifically described in detail below, those skilled in the art can understand or implement them with reference to the prior art. For example, for some conventional parameters in the CNN neural network such as w and b, etc., All can be understood with reference to the existing CNN theory, and will not be repeated below.

[0105] see figure 1 , in this example, the word vector model based on point mutual information and the CNN-based text classification method include:

[0106] (S1) training the word vector model by the global word vector method based on point mutu...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a word vector model based on point mutual information and a text classification method based on CNN. The method comprises the following steps: (1) training a word vector modelthrough a global word vector method based on point mutual information; (2) determining a word vector matrix of the text according to the trained word vector model; (3) extracting features from word vector matrix by CNN and training classification model; (4) extracting input text features according to the trained word vector model and CNN feature extraction model; (5) according to the text featuresextracted from CNN feature extraction model, calculating the mapping distance between text and preset categories by softmax and the cross entropy method, wherein the nearest one is the correspondingcategory of text. This method overcomes the shortcomings of Glove word vector in the semantic capture and statistical co-occurrence matrix, reduces the training complexity of the model, can accuratelymine the text classification features, is suitable for text classification in various fields, and has great practical value.

Description

technical field [0001] The invention relates to the field of text classification of natural language processing technology, in particular to a word vector model based on point mutual information and a text classification method based on CNN (convolutional neural network). Background technique [0002] With the development of Internet technology, the amount of data in the World Wide Web is increasing day by day, and a large amount of data is text data, which involves all walks of life in society. Facing such a huge volume of text data, how to rationalize the classification of data becomes An important research problem. Rationalization and automatic classification of text can help people solve many problems, such as spam identification, false information discovery and many other occasions. In recent years, in order to complete text classification, the representation of text is very important, and reasonable text representation can obtain accurate text semantic information. ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/35G06N3/04G06K9/62
CPCG06N3/045G06F18/24323
Inventor 李万理吴海明薛云
Owner NANJING SILICON INTELLIGENCE TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products