Multi-label text classification method and system based on attention mechanism

A text classification and attention technology, applied in neural learning methods, computer components, instruments, etc., can solve the problems of low classification accuracy, ignore the correlation between labels and text, and achieve the effect of improving accuracy

Pending Publication Date: 2022-01-18
GUANGDONG UNIV OF TECH
View PDF0 Cites 5 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] In order to solve the problem that most current multi-label text classification methods ignore the correlation between labels and text, and the classification accuracy is low when the scale of labels is large and the distribution of categories is unbalanced, the present invention proposes a multi-label text classification based on attention mechanism The method and system, based on the correlation between labels and texts, uses the attention mechanism to capture the potential relationship between labels and texts, and improves the accuracy of multi-label text classification

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Multi-label text classification method and system based on attention mechanism
  • Multi-label text classification method and system based on attention mechanism
  • Multi-label text classification method and system based on attention mechanism

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0062] Considering that most of the current multi-label text classification methods ignore the correlation between labels and texts, and when the scale of labels is large and the category distribution is unbalanced, the classification accuracy is low. In this embodiment, a multi-label based attention mechanism is proposed. Text classification method, the flowchart of the method can be found in figure 1 , the method includes the following steps:

[0063] S1. Obtain a text training set containing labels;

[0064] In this embodiment, before obtaining the text training set containing the label, it also includes: obtaining the text data set to be classified, and performing a preprocessing operation on the text to be classified in the data set; The dataset is obtained centrally.

[0065] Among them, the preprocessing operations on the text to be classified in the dataset include:

[0066] Use regular expressions to perform text filtering on the text to be classified, and then per...

Embodiment 2

[0103] Such as image 3 As shown, the present invention also proposes a multi-label text classification system based on an attention mechanism, which is used to implement the multi-label classification method proposed in Embodiment 1, and the system includes:

[0104] Training set obtaining module 11, is used for obtaining the text training set that comprises label;

[0105] The word vector conversion module 12 is used to carry out word vectorization to the text in the text training set, and convert the text in the text training set into a multidimensional text feature vector;

[0106] The tag structure matrix acquisition module 13 constructs a tag coexistence graph according to the coexistence of tags in the text training set, introduces a graph embedding algorithm to optimize the similarity between tags in the tag coexistence graph, and obtains a tag structure matrix;

[0107] Multi-label text classification model construction module 14, for constructing the multi-label tex...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a multi-label text classification method and system based on an attention mechanism, relates to the technical field of multi-label text classification, and solves the problems that most of current multi-label text classification methods neglect relevance between labels and texts and are low in classification accuracy when the label scale is large and the category distribution is unbalanced. The similarity between labels is optimized based on a graph embedding algorithm to obtain a label structure matrix, global structures and local structures of the labels are reserved, a multi-label text classification model is constructed based on a convolutional neural network and an attention mechanism, deep text feature extraction is performed by using the convolutional neural network, and the attention mechanism is utilized to capture the potential relation between the label structure and the document content, and deeper mining is carried out, and label information in a training set can be fully utilized under the conditions of large label scale and unbalanced label distribution, so that the accuracy of multi-label text classification is improved.

Description

technical field [0001] The present invention relates to the technical field of multi-label text classification, and more specifically, relates to an attention mechanism-based multi-label text classification method and system. Background technique [0002] With the vigorous development of Internet technology, the rapid generation and dissemination of information has brought about earth-shaking changes in all walks of life in society. From the previous "information scarcity" to today's "information explosion", the geometric growth of information has made massive technologies and information within easy reach, but at the same time, various invalid spam information can easily enter people's Life. Faced with such a huge amount of data, it is extremely time-consuming to sort and select the information people need one by one through manual processing. Therefore, how to accurately and quickly classify the obtained resource information is a very urgent problem to be solved. [000...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06K9/62G06F40/289G06N3/04G06N3/08
CPCG06F40/289G06N3/084G06N3/045G06F18/2415
Inventor 郭绮雯王勇
Owner GUANGDONG UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products