Chinese text classification method based on attention mechanism and feature enhancement fusion

A text classification and attention technology, applied in computer parts, character and pattern recognition, special data processing applications, etc., can solve the problems of appropriate weight configuration, single processing granularity, uneven location distribution, etc., to improve the recognition ability , the effect of improving the effectiveness

Inactive Publication Date: 2018-10-30
HARBIN UNIV OF SCI & TECH
View PDF4 Cites 61 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Therefore, the character-level question answering system based on the attention mechanism has better performance; but its processing granularity is only at the character level, and if the attention generated by the text at the word-level and sentence-level granularity can be considered comprehensively, the features will be more abundant.
[0004] In the case of relatively long Chinese text, there will be problems in text classification, and the text components containing important text feature information in the text are unevenly distributed in the text; the attention mechanism can reflect the different texts in the Chinese text. The size of the role played by elements in the text recognition process assigns greater weights to important text elements, but the learned weight matrix is ​​obtained based on iterative training of the neural network, which is a process of continuous learning of the neural network. There is no guarantee that all the weights are properly configured, which may lead to insufficient feature extraction or the deep semantics of the Chinese text contained in the extracted features are not comprehensive enough

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Chinese text classification method based on attention mechanism and feature enhancement fusion
  • Chinese text classification method based on attention mechanism and feature enhancement fusion
  • Chinese text classification method based on attention mechanism and feature enhancement fusion

Examples

Experimental program
Comparison scheme
Effect test

specific Embodiment 1

[0029] This embodiment is a specific embodiment of a Chinese text classification method based on attention mechanism and feature enhancement fusion.

[0030] A Chinese text classification method based on attention mechanism and feature enhancement fusion, comprising the following steps:

[0031] Step a, sorting out the original Chinese text corpus, performing word segmentation on the original Chinese text corpus and pre-training the word vector dictionary, and performing text preprocessing;

[0032] Step b, preprocessing the Chinese text corpus into an N-dimensional vector based on terms; performing feature selection on the preprocessed text to form a feature space of the text data set;

[0033] Step c, the original Chinese text corpus is stored in the embedding matrix of the embedding layer before entering the neural network module for training and testing after preprocessing, and the representation form of each row is a vector representation form of a text document;

[0034...

specific Embodiment 2

[0046] This embodiment is a Chinese text classification method based on the fusion of attention mechanism and feature enhancement, in which the attention mechanism model is a specific embodiment of the semantic feature differentiated attention algorithm model.

[0047] Described a kind of Chinese text classification method based on attention mechanism and feature strengthening fusion, described attention mechanism model is constituted by semantic feature difference attention algorithm model, described semantic feature difference attention algorithm model comprises the following steps:

[0048] Step a1, input the text in the semantic feature differential attention algorithm model as TEXT text, and determine the word vectors x1 and x2 in the text;

[0049] Step b1, importing the word vectors x1 and x2 into the encoder LSTM; performing an encoding operation on the word vectors x1 and x2 imported into the encoder LSTM, the word vector x1 is encoded as a semantic code h1, the The w...

specific Embodiment 3

[0085] This embodiment is a specific embodiment of a CNN module in a Chinese text classification method based on attention mechanism and feature enhancement fusion.

[0086] A Chinese text classification method based on attention mechanism and feature strengthening fusion, the CNN module includes three-dimensional convolutional neural networks with CNN3 and CNN4 convolution kernel sizes, and the CNN3 convolution kernel size is 3 times Word vector dimension, the size of the CNN4 convolution kernel is 4 times the word vector dimension.

[0087] The neural network structure characteristics of CNN determine that it is very suitable for extracting local feature information represented by digitized vectors of Chinese text corpus under different convolution kernel sizes.

[0088] The CNN module contains two convolutional neural network channels CNN3 and CNN4; both of them have three one-dimensional convolutional layers connected by a maximum pooling layer in the middle; the output wi...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The present invention provides a Chinese text classification method based on an attention mechanism and feature enhancement fusion, and belongs to the technical field of data mining. The Chinese textclassification method based on the attention mechanism and the feature enhancement fusion is characterized in that: a feature enhancement fusion Chinese text classification model based on the attention mechanism, a long short-term memory (LSTM) network and a convolutional neural network (CNN), and a feature difference enhanced attentional algorithm model are proposed; and according to the featureenhancement fusion Chinese text classification model, double-layer LSTM and CNN modules are used to sequentially perform enhancement fusion on the text features extracted by using the attention mechanism, the richness of the extracted text features can be continuously enhanced, and the contained text features can be more comprehensive and detailed, so that the recognition ability of the model to the Chinese text features is improved.

Description

technical field [0001] The invention relates to a Chinese text classification method based on attention mechanism and feature strengthening fusion, which belongs to the technical field of data mining. Background technique [0002] With the popularization of Internet applications, the number of electronic documents on the Internet is growing rapidly; in order to quickly, accurately and comprehensively mine effective information from massive electronic documents, text classification technology has received widespread attention in recent years; Chinese texts are becoming more and more abundant, and the utilization rate of Chinese information is also increasing. Therefore, it is of great practical significance to automatically classify Chinese texts. [0003] In the field of natural language processing, through the design and improvement of artificial intelligence-related algorithm models based on neural networks, the proposed algorithm model is more in line with the style chara...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30G06F17/27G06N3/04G06K9/62
CPCG06F40/289G06N3/045G06F18/24
Inventor 谢金宝侯永进马俊杰梁欣涛王玉静王滨生
Owner HARBIN UNIV OF SCI & TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products