Unlock instant, AI-driven research and patent intelligence for your innovation.

Extreme multi-label classification data enhancement method based on label and text block attention mechanism

A technology for classifying data and text blocks, which is applied in digital data processing, natural language data processing, semantic analysis, etc. It can solve the problems of poor classification performance of "long-tail" tags, and the inability to obtain better classification results from tags. boosted effect

Pending Publication Date: 2022-03-01
NANKAI UNIV
View PDF0 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0007] The purpose of the present invention is to solve the problem that the existing technology cannot obtain better classification results on labels with a small number of occurrences, and to provide an extreme multi-label classification data enhancement method based on the label and text block attention mechanism, which can enhance labels with a small number of occurrences related data, thereby improving the classification performance of various models on such labels
[0008] The present invention believes that increasing the number of "long tail" labels (labels with low frequency in the data set) in the data set through the data enhancement method and then improving the performance of extreme multi-label classification based on existing methods is an effective solution to the "long tail" Pathways with Poor Performance in Label Classification

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Extreme multi-label classification data enhancement method based on label and text block attention mechanism
  • Extreme multi-label classification data enhancement method based on label and text block attention mechanism
  • Extreme multi-label classification data enhancement method based on label and text block attention mechanism

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0028] The data enhancement method for extreme multi-label classification based on the label and text block attention mechanism provided by the present invention will be described in detail below with reference to the drawings and specific embodiments.

[0029] The present invention mainly adopts theories and methods related to natural language processing. In order to ensure the normal operation of the method, in the specific implementation, it is required that the computer platform used is equipped with a memory of not less than 16G, the number of CPU cores is not less than 4 and the main frequency Not lower than 2.6GHz, Linux operating system, and install Python 3.6 and above, pytorch framework and other necessary software environments.

[0030] In steps 1, 2): the original data set can be expressed as X N :

[0031]

[0032] Where N represents the number of data in the dataset, x i represents a piece of text, y i ∈ {0, 1} L , the label set corresponding to this piece...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses an extreme multi-label classification data enhancement method based on a label and text block attention mechanism. The method comprises the following steps: selecting an original data set; the high-level semantic representation of each word in the text is learned through BERT; the method comprises the following steps: segmenting a text into a plurality of text blocks with equal length, and averaging the high-level semantic representation of each word in the text blocks to obtain the representation of the whole text blocks; and calculating the relevancy between the representation of each text block and the vector representation of the label through an attention mechanism, fusing the representations of all the text blocks, obtaining a complete label-text block relation model after training, performing data enhancement according to the relevancy, and finally outputting a new data set after enhancement. According to the method, the relationship between the label and the text block is considered, the relevance between the label and the text is learned by using the model, the unimportant text block in the original data is replaced based on the text block associated with the long-tail label, and the multi-label classification effect of various existing models on a new data set is remarkably improved.

Description

technical field [0001] The invention belongs to the technical field of computer applications, and in particular relates to data mining and extreme multi-label classification, in particular to an extreme multi-label classification data enhancement method based on a label and text block attention mechanism. Background technique [0002] In recent years, with the rapid development of the Internet, platforms such as social media and e-commerce websites have accumulated a large amount of labeled text data. Due to the large number of label sets, the extreme multi-label classification task is to find the most relevant labels of the text from a large number of label sets. Data mining through extreme multi-label classification tasks is of great significance to the development of various industries. For example, the analysis of commodity evaluation data in e-commerce websites can help merchants understand consumers' purchasing tendencies, and then provide them with effective decision ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06K9/62G06F40/30G06F40/284
CPCG06F40/284G06F40/30G06F18/241Y02D10/00
Inventor 刘杰张嘉鑫
Owner NANKAI UNIV