Multi-label text classification method and system

A text classification, multi-label technology, applied in text database clustering/classification, unstructured text data retrieval, special data processing applications, etc., can solve problems such as improving training errors, reducing model performance and classification accuracy, and achieving Provides accuracy and avoids errors

Active Publication Date: 2019-09-06
QILU UNIV OF TECH
View PDF10 Cites 40 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

During the research and development process, the inventors found that this method has the following technical problems: tags that are often irrelevant to the text or have low relevance have also calculated their probabilities. Obviously, these probabilities are unnecessary, which not only increases the training error, but also Reduced model performance and classification accuracy

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Multi-label text classification method and system
  • Multi-label text classification method and system
  • Multi-label text classification method and system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0049] In order to solve the problem that existing methods ignore the correlation between labels and texts, this embodiment provides a multi-label text classification method based on LSTM-CNN and attention mechanism, considering the correlation between labels and texts, calculating The probability of several labels (label subsets) that are more relevant to the text is sufficient, which significantly improves the prediction efficiency of the model, avoids unnecessary errors, and improves accuracy.

[0050] Please refer to the attached figure 1 , the multi-label text classification method comprises the following steps:

[0051] S101, given a training set containing text sequences and label spaces, using a long short-term memory network

[0052] LSTM extracts the global feature vectors of all words in the text sequence, and uses the convolutional neural network (CNN) to aggregate the global feature vectors of all the words in the text sequence to obtain the semantic vectors of a...

Embodiment 2

[0107] This embodiment provides a multi-label text classification system based on LSTM-CNN and attention mechanism, which is used to implement the multi-label text classification method based on LSTM-CNN and attention mechanism described in the above embodiments.

[0108] Please refer to the attached Figure 4 , the multi-label text classification system includes an encoding module, a decoding module and a classification module, wherein:

[0109] The encoding module is used to obtain a training set including a text sequence and a label space, use a long short-term memory network to extract the global feature vectors of all words in the text sequence, and use a convolutional neural network to aggregate the obtained global feature vectors to obtain the text Semantic vectors for each word in the sequence;

[0110] The decoding module is used to calculate the weight coefficients of each label in the note space and all words in the text sequence respectively, construct th...

Embodiment 3

[0113] A computer-readable storage medium, on which a computer program is stored, and when the program is executed by a processor, the steps in the multi-label text classification method described above are realized.

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a multi-label text classification method and system. The method comprises the following steps: obtaining a training set comprising a text sequence and a label space, extractingglobal feature vectors of all words in the text sequence by adopting a long-short time memory network, and aggregating the obtained global feature vectors by adopting a convolutional neural network to obtain a semantic vector of each word in the text sequence; respectively calculating weight coefficients of each label in the note space and all words in the text sequence, constructing an attentionweight coefficient matrix, and processing the attention weight coefficient matrix to obtain an optimal weight coefficient matrix; respectively weighting the semantic vector of each word and the weight coefficient vector in the optimal weight coefficient matrix to obtain an attention vector of the tag; and performing normalization processing on the attention vectors of the tags to obtain the probability of each tag, and selecting several tags with maximum probabilities to classify the text.

Description

technical field [0001] The present disclosure relates to the technical field of text classification, in particular to a multi-label text classification method, system, storage medium and computer equipment based on LSTM-CNN and attention mechanism. Background technique [0002] Multi-label text classification is a complex and challenging task in natural language processing. Different from traditional binary classification or multi-classification, multi-label classification deals with the task of real-life text with multiple categories. [0003] At present, there are many machine learning algorithms for multi-label text classification. According to the perspective of problem solving, these algorithms can be divided into two categories: one is the method based on problem transformation. Transform the multi-label classification task into multiple binary classification or multi-classification problems, making it suitable for existing algorithms, such as: SVM, DT, NativeBayes, e...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/35
CPCG06F16/35
Inventor 杨振宇刘国敬
Owner QILU UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products