Unlock instant, AI-driven research and patent intelligence for your innovation.

Multi-source multi-label text classification method and system based on improved seq2seq model

A text classification and multi-label technology, applied in the field of multi-source multi-label text classification methods and systems thereof, to achieve the effect of improving accuracy

Active Publication Date: 2020-06-23
广州语义科技有限公司
View PDF9 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] The present invention proposes a multi-source multi-label text classification method and its system based on the improved seq2seq model. The main improvements to the traditional seq2seq model are reflected in the addition of multiple encoders and the definition of a loss function that is not sensitive to the label order, which can effectively solve the problem of Text classification problems where the input is multi-source text data and the output is multi-label

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Multi-source multi-label text classification method and system based on improved seq2seq model
  • Multi-source multi-label text classification method and system based on improved seq2seq model
  • Multi-source multi-label text classification method and system based on improved seq2seq model

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0038] The following will clearly and completely describe the technical solutions in the embodiments of the present invention with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only some, not all, embodiments of the present invention. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.

[0039] see figure 1 As shown, the present invention provides a kind of multi-source multi-label text classification method based on improved seq2seq model, and this method comprises the steps:

[0040] Step 1, data input and preprocessing, carry out word segmentation and remove stop words to the input multi-source text corpus, construct the Chinese vocabulary of the input corpus, serialize the Chinese vocabulary of the input corpus, and in the Chinese vocabula...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention belongs to the technical field of natural language processing text classification, and specifically provides a multi-source multi-label text classification method and system based on an improved seq2seq model. The method includes the following steps: data input and preprocessing, word embedding, encoding, encoding Stitching, decoding, model optimization, and predicting output. The method of the present invention has the following beneficial effects: using the seq2seq deep learning framework to construct multiple encoders, combined with the attention mechanism for text classification tasks, maximizing the use of multi-source corpus information, and improving the accuracy of multi-label classification; In the error feedback process of the step, according to the characteristics of multi-label text, an intervention mechanism is added to avoid the impact of label sorting, which is more in line with the essence of multi-label classification problems; the encoder uses a recurrent neural network, which can effectively follow the time step. Learning; the decoding layer uses a one-way recurrent neural network, and adds an attention mechanism to highlight the key points of learning.

Description

technical field [0001] The invention belongs to the technical field of natural language processing text classification, and in particular relates to a multi-source multi-label text classification method and system based on an improved seq2seq model. Background technique [0002] Automatic text classification is one of the main tasks of natural language processing technology. Multi-label text classification is used to deal with the situation that the category of text corresponds to more than one label. The problem of multi-label text classification is very common in real life. The description text of , may correspond to multiple police categories: "taking drugs", "intentionally hurting people", etc. However, compared with single-label text classification, research on multi-label text classification is less, and its performance generally cannot reach the level of single-label text classification. [0003] In addition, the description of a thing may correspond to multiple text...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/35G06F40/289G06N3/04G06N3/08
CPCG06F40/289G06N3/045
Inventor 谢松县高辉陈仲生彭立宏曾道建桂林封黎李磊
Owner 广州语义科技有限公司