Word vector model based on point mutual information and text classification method based on cnn

A technology of point mutual information and text classification, applied in the field of text classification based on CNN and word vector model based on point mutual information, it can solve the problems of multi-classification problems of support vector machines, tree structure changes, difficult learning of decision trees, etc.

Active Publication Date: 2020-01-17
NANJING SILICON INTELLIGENCE TECH CO LTD
View PDF2 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

When the number of m is large, the storage and calculation of the matrix will consume a lot of machine memory and computing time;
[0017] Second, there are difficulties in solving multi-classification problems with support vector machines
[0021] First, the decision tree algorithm is very easy to overfit, resulting in poor generalization ability;
[0022] Second, the decision tree will cause drastic changes in the tree structure due to a small change in the sample;
[0023] Third, for some complex relationships, decision trees are difficult to learn, such as XOR
[0028] Second, if you want to approximate complex functions more accurately, you must increase the number of hidden layers, which is prone to the problem of gradient disappearance or gradient explosion;
[0029] Third, time series data (e.g. audio, text) cannot be processed because neural networks do not contain time parameters
[0038] However, LSTM can only avoid the gradient disappearance of RNN, but cannot resist the gradient explosion problem

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Word vector model based on point mutual information and text classification method based on cnn
  • Word vector model based on point mutual information and text classification method based on cnn
  • Word vector model based on point mutual information and text classification method based on cnn

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0104] The foregoing summary of the invention has described the solutions of the present invention in sufficient detail. The specific implementation of the present invention will be described in detail below with reference to the drawings and specific embodiments, but the implementation of the present invention is not limited thereto. It should be pointed out that if there are any processes or symbols that are not specifically described in detail below, those skilled in the art can understand or implement them with reference to the prior art. For example, for some conventional parameters in the CNN neural network such as w and b, etc., All can be understood with reference to the existing CNN theory, and will not be repeated below.

[0105] see figure 1 , in this example, the word vector model based on point mutual information and the CNN-based text classification method include:

[0106] (S1) training the word vector model by the global word vector method based on point mutu...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a word vector model based on point mutual information and a text classification method based on CNN. The method includes: (1) training the word vector model through the global word vector method based on point mutual information; (2) determining the word vector matrix of the text according to the trained word vector model; (3) extracting the word vector through CNN features in the matrix, and train the classification model; (4) Extract the input text features according to the trained word vector model and CNN feature extraction model; (5) According to the text features obtained by the CNN feature extraction model, use softmax and cross entropy methods Calculate the mapping distance between the text and the preset category, and take the closest distance as the corresponding category of the text. This method overcomes the shortcomings of Glove word vectors in semantic capture and statistical co-occurrence matrix, reduces the complexity of model training, and can accurately mine the classification features of text. It is suitable for text classification in various fields and has great practical value. .

Description

technical field [0001] The invention relates to the field of text classification of natural language processing technology, in particular to a word vector model based on point mutual information and a text classification method based on CNN (convolutional neural network). Background technique [0002] With the development of Internet technology, the amount of data in the World Wide Web is increasing day by day, and a large amount of data is text data, which involves all walks of life in society. Facing such a huge volume of text data, how to rationalize the classification of data becomes An important research problem. Rationalization and automatic classification of text can help people solve many problems, such as spam identification, false information discovery and many other occasions. In recent years, in order to complete text classification, the representation of text is very important, and reasonable text representation can obtain accurate text semantic information. ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/35G06N3/04G06K9/62
CPCG06N3/045G06F18/24323
Inventor 李万理吴海明薛云
Owner NANJING SILICON INTELLIGENCE TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products