Small sample multi-label text classification model training method and text classification method

A text classification and multi-label technology, applied in the computer field, can solve the problems of a lot of labor, time cost, and small number of texts, and achieve the effect of improving training efficiency, reducing labor cost and time cost

Active Publication Date: 2022-07-29
PEOPLE CN CO LTD
View PDF4 Cites 3 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, the high accuracy of deep learning depends on a large amount of labeled data, and data labeling requires a lot of labor and time costs. In many cases, large-scale labeled data is not available. When the number of text categories is large, the labeled data The number of texts belonging to each category will be distributed in a long tail, and the number of texts in most categories is very small. Therefore, there is an urgent need for a text classification scheme based on small-scale labeled texts

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Small sample multi-label text classification model training method and text classification method
  • Small sample multi-label text classification model training method and text classification method
  • Small sample multi-label text classification model training method and text classification method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0038] Exemplary embodiments of the present invention will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present invention are shown in the drawings, it should be understood that the present invention may be embodied in various forms and should not be limited by the embodiments set forth herein. Rather, these embodiments are provided so that a more thorough understanding of the present invention will be provided, and will fully convey the scope of the present invention to those skilled in the art.

[0039] The inventor of the present invention found that data augmentation can be used to expand data in the prior art, and the existing data augmentation methods generally use techniques such as synonym replacement, reversal of word order, and back translation to modify original data to obtain semantically unchanged data. The new data and the modified objects are the original data marked. Since the sentence vector...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a small sample multi-label text classification model training method, a small sample multi-label text classification method and device, computing equipment and a computer storage medium. According to the method, the key phrases are extracted from the labeled sample text corresponding to the text label, the prompt template is expanded according to the key phrases, and the data augmentation is realized based on the prompt template, so that model training based on the labeled sample text of a small sample is realized, the defect that a large scale of labeled texts cannot be obtained is overcome, and the user experience is improved. Moreover, the training efficiency of the text classification model is improved, it is not needed to spend a long time for manual labeling, and the labor cost and the time cost are reduced.

Description

technical field [0001] The present invention relates to the field of computer technology, in particular to a training method for a small-sample multi-label text classification model, a small-sample multi-label text classification method, a device, a computing device and a computer storage medium. Background technique [0002] Text classification is a classic task in the field of natural language processing, which aims to automatically label text with one or more pre-defined class labels using machines. After the rise of deep learning technology, text classification research has made great progress. However, the high accuracy of deep learning relies on a large amount of labeled data, and data labeling requires a lot of labor and time costs. In many cases, large-scale labeled data is not available. When the number of text categories is large, the labeled data The number of texts belonging to each category will have a long-tailed distribution, and the number of texts in most c...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06K9/62G06F40/289G06F40/211
CPCG06F40/211G06F40/289G06F18/241G06F18/214
Inventor 刘殿卿徐向春郭俊波靳国庆刘乃榕王海燕
Owner PEOPLE CN CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products