Character-level nested deep network-based text classification method

A technology of deep network and text classification, applied in text database clustering/classification, neural learning methods, biological neural network models, etc., can solve the problem of weak feature expression, failure to consider the relationship between words, unfriendly low-frequency words, etc. problem, to achieve the effect of accurate text classification, effective features, and obvious dimensionality reduction

Active Publication Date: 2018-03-23
SUN YAT SEN UNIV
View PDF5 Cites 19 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] As can be seen from the above, the selection of text features occupies a very important part in text classification, and the text representation of the main problem of traditional methods is high-dimensional and sparse, and the feature expression ability is very weak, and the traditional text classification method does not consider To the relationship between words and unfriendly to low-frequency words

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Character-level nested deep network-based text classification method
  • Character-level nested deep network-based text classification method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0030] The present invention will be further described below in conjunction with specific embodiment:

[0031] See attached figure 1 As shown, a kind of character-level text classification method based on nested depth network described in this embodiment comprises the following steps:

[0032] S1. Construct a character vector matrix table:

[0033] Assuming that C is the character set used in the text (English letters and various special symbols in English, strokes and various special symbols in Chinese), construct a character vector matrix Q∈R |C|×|C| , record the row number corresponding to each character; the matrix Q adopts one-hot encoding, and the diagonal elements are all set to 1, and the rest are 0. Each row vector of the matrix Q represents a character, and the row number corresponding to each character is recorded.

[0034] S2. Short text preprocessing, converting the short text into a character vector matrix, which is divided into two steps:

[0035] S21, matri...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a character-level nested deep network-based text classification method. The method comprises the following steps of S1, constructing a character vector matrix table; S2, performing short text preprocessing; S3, extracting high-dimensional sequence features by improved Resnet; and S4, performing LSTM network classification. The character level-based text conversion can effectively perform conversion on all texts; compared with a conventional vector space model, the dimension is reduced remarkably; all the texts can be effectively converted; and low-frequency words are not ignored. In addition, the improved Resnet can self-learn a feature extraction method; and compared with conventional methods such as a TF-IDF formula, a mutual information quantity, an informationgain, an x2 statistical quantity and the like, the extracted features are more effective and abstract. Finally, the LSTM network classification can consider a sequence relationship between the words,so that the text classification can be performed more accurately.

Description

technical field [0001] The invention relates to the technical field of text classification, in particular to a character-level text classification method based on a nested deep network. Background technique [0002] With the continuous development of network technology, the Internet generates massive unstructured text data every day. In order to obtain useful value from these massive data, we need to classify these texts. [0003] Early text classification mainly classifies texts by manually defining some rules. This method is time-consuming and laborious, and one must have sufficient understanding of a certain field to write appropriate rules. With the emergence of a large number of online texts and the rise of machine learning, large-scale text (including web page) classification and retrieval has aroused researchers' interest again. The text classification system first establishes a discriminant rule or classifier by training on a pre-classified text set, so as to automa...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30G06N3/04G06N3/08
CPCG06N3/049G06N3/08G06F16/35
Inventor 郑子彬李晓杰吴向军
Owner SUN YAT SEN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products