Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

A character-level text classification method based on nested deep networks

A technology of deep network and text classification, which is applied in text database clustering/classification, neural learning methods, biological neural network models, etc. It can solve the problem of weak feature expression ability, failure to consider the relationship between words, unfriendly low-frequency words, etc. problem, to achieve the effect of accurate text classification, effective features, and obvious dimensionality reduction

Active Publication Date: 2021-08-10
SUN YAT SEN UNIV
View PDF5 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] As can be seen from the above, the selection of text features occupies a very important part in text classification, and the text representation of the main problem of traditional methods is high-dimensional and sparse, and the feature expression ability is very weak, and the traditional text classification method does not consider To the relationship between words and unfriendly to low-frequency words

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A character-level text classification method based on nested deep networks
  • A character-level text classification method based on nested deep networks
  • A character-level text classification method based on nested deep networks

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0030] The present invention will be further described below in conjunction with specific embodiment:

[0031] See attached figure 1 As shown, a kind of character-level text classification method based on nested depth network described in this embodiment comprises the following steps:

[0032] S1. Construct a character vector matrix table:

[0033] Assuming that C is the character set used in the text (English letters and various special symbols in English, strokes and various special symbols in Chinese), construct a character vector matrix Q∈R |C|×|C| , record the row number corresponding to each character; the matrix Q adopts one-hot encoding, and the diagonal elements are all set to 1, and the rest are 0. Each row vector of the matrix Q represents a character, and the row number corresponding to each character is recorded.

[0034] S2. Short text preprocessing, converting the short text into a character vector matrix, which is divided into two steps:

[0035] S21, matri...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a character-level text classification method based on a nested deep network, comprising the following steps: S1, constructing a character vector matrix table; S2, short text preprocessing; S3, improving Resnet to extract high-dimensional sequence features; S4, LSTM Web classification. The text conversion based on the character level of the present invention can effectively convert all texts. Compared with the traditional vector space model, the dimensionality drops significantly, and can effectively convert all texts without ignoring low-frequency words; in addition, the improved Resnet can self-learn feature extraction methods. Compared with traditional TF-IDF formulas, mutual information, information gain, χ2 statistics and other methods, the extracted features are more effective and abstract; finally, LSTM network classification can consider words and words The order relationship between them, so that text classification can be performed more accurately.

Description

technical field [0001] The invention relates to the technical field of text classification, in particular to a character-level text classification method based on a nested deep network. Background technique [0002] With the continuous development of network technology, the Internet generates massive unstructured text data every day. In order to obtain useful value from these massive data, we need to classify these texts. [0003] Early text classification mainly classifies texts by manually defining some rules. This method is time-consuming and laborious, and one must have sufficient understanding of a certain field to write appropriate rules. With the emergence of a large number of online texts and the rise of machine learning, large-scale text (including web page) classification and retrieval has aroused researchers' interest again. The text classification system first establishes a discriminant rule or classifier by training on a pre-classified text set, so as to automa...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/35G06N3/04G06N3/08
CPCG06N3/049G06N3/08G06F16/35
Inventor 郑子彬李晓杰吴向军
Owner SUN YAT SEN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products