Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Document classification method based on hierarchical multi-attention network

A document classification and attention technology, applied in text database clustering/classification, biological neural network model, unstructured text data retrieval, etc. Weight and other issues

Inactive Publication Date: 2019-04-02
SOUTH CHINA NORMAL UNIVERSITY
View PDF1 Cites 35 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

We mentioned many classic models above, which have achieved remarkable results in many text classification problems. However, for document classification, these models still have the following problems: 1. Ignore the document from word to sentence , the hierarchical structure information from sentence to document, and directly use each word vector as the input of the deep network; 2. Use a single attention mechanism to determine the contribution weight of each part in the document, and do not fully consider the transition from word to sentence, sentence to document The characteristics of the two parts make it impossible to effectively use the internal structure information of the document

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Document classification method based on hierarchical multi-attention network
  • Document classification method based on hierarchical multi-attention network
  • Document classification method based on hierarchical multi-attention network

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0044] The implementation of the invention will be further described below in conjunction with the accompanying drawings and examples, but the implementation and protection of the present invention are not limited thereto. If there are processes or symbols that are not specifically described in detail below, those skilled in the art can refer to the prior art to understand or realized.

[0045] A kind of document classification method based on hierarchical multi-attention network of this example, comprises steps: (1) according to the modeling characteristic of document in the text classification, utilize bidirectional GRU sequence model to carry out document from word to sentence, sentence to document Modeling fully embodies the hierarchical structure of documents in the model; (2) for the process from words to sentences, in order to accurately express the importance of different words in sentences, the present invention utilizes a bidirectional GRU sequence model for each Wor...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a document classification method based on a hierarchical multi-attention network. The method comprises the following steps of utilizing a Bi-GRU sequence model for carrying outword-sentence and sentence-to-document modeling on the document; using Bi-GRU sequence model to encode each word, obtaining the context information in the sentence, and using the Soft attention to carry out the weight distribution on each word; for the process from the sentences to the document, introducing the CNN attention, and obtaining the local relevant characteristics between the sentencesin the window by utilizing a CNN model, so that the attention weight of each sentence is further obtained. Modeling can be carried out from words to sentences and from sentences to documents accordingto document characteristics, and the hierarchical structure of the documents is fully considered. Meanwhile, aiming at the word level and the sentence level, different attention mechanisms are respectively adopted to properly distribute the weights of the related contents, so that the document classification accuracy is improved.

Description

technical field [0001] The invention belongs to the field of natural language processing technology and sentiment analysis, in particular to a document classification method based on a hierarchical multi-attention network. Background technique [0002] Text classification is one of the important topics in the field of natural language processing. With the continuous improvement of the amount of data and the computing power of hardware, the theory and method of text classification are playing an increasingly important role and have received widespread attention. Early text classification research was mainly based on the method of knowledge engineering system, requiring experts in a certain field to customize classification rules for texts in this field, but this method requires a lot of manpower to expand or modify the rules and do a lot of maintenance work. Later, with the development of machine learning technology, text classification methods based on machine learning grad...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F16/35G06F17/27G06N3/04
CPCG06F40/211G06F40/284G06N3/045
Inventor 黄英仁王子文薛云
Owner SOUTH CHINA NORMAL UNIVERSITY
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products