Text classification method combining dynamic word embedding with part-of-speech tagging

A text classification and part-of-speech tagging technology, applied in the field of mobile communications, can solve problems such as the inability to make full use of sentence grammatical structure, the accuracy of text classification, and the inability of the model to learn better, achieving high accuracy, strong versatility, and improved accuracy rate effect

Inactive Publication Date: 2017-10-24
SOUTH CHINA UNIV OF TECH
View PDF3 Cites 50 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, these studies mainly use static word embedding, that is, the value of each vector element in the word embedding remains unchanged during the model training process, which makes the model unable to better learn the characteristics of the text in the target corpus
On

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Text classification method combining dynamic word embedding with part-of-speech tagging

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0028] This embodiment discloses a multi-channel deep neural network combined with part-of-speech tagging and dynamic word embedding, and applies it to an automatic text classification method. The main idea is to use the word vector to give the mathematical representation of the word and its part of speech in the sentence. On the one hand, based on the word embedding table after the pre-training operation, the real number vector representation of each word in the sentence after preprocessing is given. On the other hand, the After tagging the part of speech of each word in the sentence, the uniform distribution is used to randomly initialize the part of speech as the real number of the specified dimension. Then, use two separate bidirectional LSTM layers to learn the information in the two inputs respectively, so as to obtain the context relationship of words and parts of speech respectively, and combine the results into a dual-channel; on this basis, pass the dual-channel to a ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a text classification method combining dynamic word embedding with part-of-speech tagging, and provides the text classification method based on a deep neural network through combining dynamic word embedding with part-of-speech tagging. The method can fully utilize the advantages that a large-scale corpus can provide more accurate grammar and semantic information, and can also adjust word embedding by combining with the features of the corpus during a model training process, and thus the features of the corpus can be better learned. Meanwhile, classification accuracy can be further improved by combining with part of speech information of words in sentences. The invention also comprehensively utilizes the advantages of LSTM in the aspect of learning context information of words and part of speech in the sentences, and the advantages of CNN in the aspect of learning text local features. The classification model provided by the invention has the advantages of high accuracy and strong universality, and achieves good effect in some famous public corpuses including IMDB corpus, Movie Review and TREC.

Description

technical field [0001] The invention relates to the technical field of mobile communication. Specifically, it relates to a text classification method combining dynamic word embedding and part-of-speech tagging. Background technique [0002] Automatic text classification based on machine learning refers to the process of using various computer algorithms to analyze the content of text and automatically determine the text category under the premise of a given classification system. Early research was mainly based on shallow machine learning and statistics, and used one-hot (also known as one-of-V, V is the size of the dictionary) or distributional methods (such as combining word frequency, co-occurrence information, TF- IDF or bag-of-words for entropy) to give a mathematical representation of a sentence. The main disadvantage of this representation method is that it cannot express the semantics of language units (such as words, words or phrase n-grams) in sentences and the r...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30G06F17/27
CPCG06F16/35G06F40/284
Inventor 苏锦钿李鹏飞罗达
Owner SOUTH CHINA UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products