Multi-source and multi-label text classification method and system based on improved seq2seq model

A text classification and multi-label technology, applied in the field of multi-source multi-label text classification methods and systems thereof, to achieve the effect of improving accuracy

Active Publication Date: 2019-02-01
广州语义科技有限公司
View PDF9 Cites 34 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] The present invention proposes a multi-source multi-label text classification method and its system based on the improved seq2seq model. The main improvements to the traditional seq2seq model are reflected in the addition of multiple encoders and the definition of a loss function that is not sensitive to the label order, which can effectively solve the problem of Text classification problems where the input is multi-source text data and the output is multi-label

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Multi-source and multi-label text classification method and system based on improved seq2seq model
  • Multi-source and multi-label text classification method and system based on improved seq2seq model
  • Multi-source and multi-label text classification method and system based on improved seq2seq model

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0038] The following will clearly and completely describe the technical solutions in the embodiments of the present invention with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only some, not all, embodiments of the present invention. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.

[0039] see figure 1 As shown, the present invention provides a kind of multi-source multi-label text classification method based on improved seq2seq model, and this method comprises the steps:

[0040] Step 1, data input and preprocessing, carry out word segmentation and remove stop words to the input multi-source text corpus, construct the Chinese vocabulary of the input corpus, serialize the Chinese vocabulary of the input corpus, and in the Chinese vocabula...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention belongs to the technical field of natural language processing text classification, in particular to a multi-source multi-label text classification method based on an improved seq2seq model and a system thereof. The method comprises the following steps: data input and pretreatment, word embedding, encoding, encoding and splicing, decoding, model optimization and prediction output. Themethod of the invention has the following beneficial effects: adopting a seq2seq depth learning framework, constructing a plurality of encoders, and combining the attention mechanism to be used for atext classification task, so as to maximize the use of multi-source corpus information and improve the classification accuracy of the multi-label; In the error feedback process of decoding step, according to the characteristics of multi-label text, an intervention mechanism is added to avoid the influence of label sorting, which is more in line with the essence of multi-label classification problem. The encoder adopts the circulating neural network, which can learn according to the time step effectively. The decoding layer adopts one-way loop neural network and adds attention mechanism to highlight the learning focus.

Description

technical field [0001] The invention belongs to the technical field of natural language processing text classification, and in particular relates to a multi-source multi-label text classification method and system based on an improved seq2seq model. Background technique [0002] Automatic text classification is one of the main tasks of natural language processing technology. Multi-label text classification is used to deal with the situation that the category of text corresponds to more than one label. The problem of multi-label text classification is very common in real life. The description text of , may correspond to multiple police categories: "taking drugs", "intentionally hurting people", etc. However, compared with single-label text classification, research on multi-label text classification is less, and its performance generally cannot reach the level of single-label text classification. [0003] In addition, the description of a thing may correspond to multiple text...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/35G06F17/27G06N3/04
CPCG06F40/289G06N3/045
Inventor 谢松县高辉陈仲生彭立宏曾道建桂林封黎李磊
Owner 广州语义科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products