Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Text classification method and terminal device

A text classification and terminal device technology, applied in the field of information processing, can solve the problem of inability to deeply recognize and understand text, and achieve the effect of fast and accurate search, simple and fast calculation process, and improved efficiency

Inactive Publication Date: 2017-01-11
SHANGHAI GAOXIN COMP SYST CO LTD
View PDF5 Cites 23 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] In the process of realizing the present invention, the inventors of the present application found that the CBOW training method based on the hierarchical classifier Hierarchical Softmax is more beneficial to rare words, and can achieve a faster classification of this type of text. The CBOW based on the negative sampling algorithm The training method is beneficial to the text classification of common words and low-dimensional vectors. At the same time, when CBOW based on two different algorithms is trained, the window size usually selected is about 5. The word vectors obtained based on the above two text training methods have Certain semantic features, but cannot deeply recognize and understand the content of the text

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Text classification method and terminal device
  • Text classification method and terminal device
  • Text classification method and terminal device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0036] In order to make the object, technical solution and advantages of the present invention clearer, various embodiments of the present invention will be described in detail below in conjunction with the accompanying drawings. However, those of ordinary skill in the art can understand that, in each implementation manner of the present invention, many technical details are provided for readers to better understand the present application. However, even without these technical details and various changes and modifications based on the following implementation modes, the technical solution claimed in this application can also be realized.

[0037] The first embodiment of the present invention relates to a text classification method of word vectors. The specific process is as figure 1 shown.

[0038] In step 101, the word vector matrix W is calculated ij , input the segmented data of training samples of N text types into the continuous bag-of-words model CBOW, and calculate ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to the field of information processing, and discloses a text classification method and a terminal device. According to the embodiment, the method comprises the steps of calculating a word vector matrix containing all word vectors after training samples of N text types are segmented; calculating out the feature vector of the training sample of each text type based on the word vector matrix; calculating the input amount of a back-propagating neural network according to the calculated feature vectors of the training samples; then determining a text classifier according to the back-propagating neural network; finally determining the type of a text to be tested according to the feature vector of the text to be tested and the text classifier. In CBOW pair word vectorization, the relation of a current word with a few front words and a few rear words is taken into consideration, the whole network training also has a semantic feature by being combined with the classical back-propagating neural network, the whole network can further recognize and understand the text content, and the better training effect is achieved.

Description

technical field [0001] The invention relates to the field of information processing, in particular to a text classification method and terminal equipment. Background technique [0002] Text classification refers to taking a set of texts classified by experts in advance as a training sample set, analyzing the training sample set to obtain a classification pattern, and using the derived classification pattern to classify other texts. It is mainly used in information retrieval, machine translation, automatic summarization and information filtering. [0003] In the process of realizing the present invention, the inventors of the present application found that the CBOW training method based on the hierarchical classifier Hierarchical Softmax is more beneficial to rare words, and can achieve a faster classification of this type of text. The CBOW based on the negative sampling algorithm The training method is beneficial to the text classification of common words and low-dimensiona...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/30G06N3/08
CPCG06F16/355G06N3/084
Inventor 周诚赵世亭
Owner SHANGHAI GAOXIN COMP SYST CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products