Multi-label long text classification method introducing multi-path selection fusion mechanism

A multi-way selection and classification method technology, applied in the field of multi-label long text classification that introduces a multi-way selection fusion mechanism, to achieve powerful feature extraction capabilities, good prediction time, and short training effects

Active Publication Date: 2019-08-16
UNIV OF ELECTRONIC SCI & TECH OF CHINA
View PDF7 Cites 8 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] In view of the deficiencies in the prior art, the present invention provides a multi-label long text classification method that introduces a mu

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Multi-label long text classification method introducing multi-path selection fusion mechanism
  • Multi-label long text classification method introducing multi-path selection fusion mechanism
  • Multi-label long text classification method introducing multi-path selection fusion mechanism

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0034] Such as Figure 1-4 Shown:

[0035] For the 3 million training data set released by a machine learning challenge, the title data and description data are spliced ​​to obtain long text data. For data without description, a copy of the question is used as a description. Then, 200,000 data are divided into 200,000 as a verification set, 200,000 as a test set, and the remaining 2.6 million as a training set.

[0036] After the data is removed from low-frequency words, the vocabulary required by the encoder is established, and the vocabulary of the category labels required by the decoder is established. The sequence start symbol is added in front of the label sequence to obtain the input of the decoder, and the label sequence is followed by adding The sequence end symbol gets the output of the decoder, such as for the input long text x 1 、x 2 ...x n , labeled as l 1 , l 2 ,...,l n' , the starting symbol of the sequence is , the end symbol of the sequence is , then t...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a multi-label long text classification method introducing a multi-path selection fusion mechanism, and relates to the technical field of multi-label long text classification based on a sequence-to-sequence architecture. According to the invention, the effect of completing multi-label long text classification based on a sequence-to-sequence architecture is improved; data published by challenging contest is learned based on a certain machine; the title data and the description data are spliced to obtain long text data; for undescribed data, a problem is copied as a description, then low-frequency word removal preprocessing is carried out on the data to obtain more effective data, a converter model added with a multi-path selection fusion mechanism is adopted for the obtained data to generate a label sequence for an input long text, and redundant information is effectively removed during decoding. Under test data, the recall rate of the label sequence generated by the model is 0.5% of the recall rate of a model which is not added with multi-path selection fusion; and the accuracy and the F1 value are improved by 1%.

Description

technical field [0001] The invention relates to the technical field of multi-label long text classification based on sequence-to-sequence architecture, in particular to a multi-label long text classification method introducing a multi-way selection fusion mechanism. Background technique [0002] In the process of studying multi-label long text classification based on sequence-to-sequence architecture. Attention mechanism, the attention mechanism in deep learning is modeled on the human visual attention mechanism, according to the need to focus on a certain part of the input sequence each time, instead of paying attention to all at once. The attention mechanism has been widely used in the field of natural language processing. The attention mechanism is divided into hard attention and soft attention. The soft attention mechanism assigns an attention weight to each part of the sequence. To calculate the attention weight, first calculate the distribution of each part of the se...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F16/35
CPCG06F16/35
Inventor 屈鸿秦展展侯帅黄鹂张晓敏
Owner UNIV OF ELECTRONIC SCI & TECH OF CHINA
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products