Unlock instant, AI-driven research and patent intelligence for your innovation.

A multi-label text classification method based on seq2seq

A text classification and multi-label technology, applied in the field of multi-label text classification based on seq2seq, can solve the problems of manual design, time-consuming and labor-consuming, and less consideration of label correlation, etc., to achieve the effect of improving accuracy and accuracy

Active Publication Date: 2022-06-17
UNIV OF ELECTRONICS SCI & TECH OF CHINA
View PDF8 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Using neural networks to process text classification makes up for many shortcomings of traditional machine learning methods, such as: no need to manually design features, etc., but the neural network methods proposed in recent years to process multi-label text classification rarely consider the correlation between labels and ignore the importance of local semantic features, which greatly affects the accuracy of classification
[0006] To sum up, the multi-label text classification method based on traditional machine learning requires manual design of features, which is very time-consuming and labor-intensive, and the quality of features has a great impact on the classification effect
At the same time, it is difficult for such methods to effectively consider the correlation between labels
However, the existing multi-label text classification method based on deep learning can automatically extract effective features, but it cannot effectively consider the correlation between labels and ignore the importance of local features.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A multi-label text classification method based on seq2seq
  • A multi-label text classification method based on seq2seq
  • A multi-label text classification method based on seq2seq

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0068] Example 1, combining figure 1 , a multi-label text classification method based on seq2seq, including steps:

[0069] S1: Preprocess the training corpus;

[0070] S2: Establish a multi-label text classification model based on seq2seq, and train the parameters of the model;

[0071] S3: Use the trained multi-label text classification model to perform text classification on the data to be predicted.

[0072] Further, see figure 2 , the preprocessing steps in S1 include:

[0073] 1): Segment the training corpus OrgData and remove stop words, obtain the processed corpus NewData and save it; stop words refer to stop words such as "le", "ge" and other meaningless words such as special symbols.

[0074] 2): Count the words that are not repeated in NewData, obtain the word set WordSet, number each word, and obtain the word number set WordID corresponding to the word set WordSet;

[0075] 3): Count the labels of the training corpus, obtain the label set LableSet, number eac...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a seq2seq-based multi-label text classification method in the field of tag text classification, comprising steps: S1: preprocessing the training corpus; S2: establishing a seq2seq-based multi-label text classification model, and training the parameters of the model; S3: Use the trained multi-label text classification model to perform text classification on the data to be predicted. The present invention does not need to manually extract features, and can use CNN to extract local semantic information of the text, which can improve the accuracy of text classification, and use the initialized fully connected layer, which can consider the correlation between labels, thereby improving the accuracy of text classification .

Description

technical field [0001] The invention relates to the field of label text classification, in particular to a multi-label text classification method based on seq2seq. Background technique [0002] Traditional text classification techniques mainly focus on single-label classification, that is, a text corresponds to only one category label. However, multi-label text classification is more common and more difficult in real life than single-label classification. Because in multi-label classification, the number of label subsets grows exponentially with the number of labels. Assuming that the number of labels in a multi-label classification problem is K, the theoretical total number of label subsets is 2^k-1, so from this exponential number of label subsets, there is no doubt that the correct label subset is selected is a huge challenge. To address this challenge, it is often necessary to exploit the correlation between labels to facilitate the learning process. For example: if ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/35G06F40/289G06F40/30G06N3/04G06N3/08
CPCG06F16/35G06N3/08G06N3/045
Inventor 廖伟智王宇马攀阴艳超
Owner UNIV OF ELECTRONICS SCI & TECH OF CHINA