Check patentability & draft patents in minutes with Patsnap Eureka AI!

A Text Classification Method Based on Few Samples

A text classification and sample technology, applied in text database clustering/classification, unstructured text data retrieval, instruments, etc., can solve the problems of a large number of manual annotations in the training set, inaccurate training classification with few samples, etc., and achieve training classification Inaccurate, avoiding manpower and time effects

Active Publication Date: 2021-06-18
成都数联铭品科技有限公司
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0007] The purpose of the present invention is to solve two problems, one is that the training classification of few samples is inaccurate, and the other is to increase the training set but requires a large number of manual annotations, and provide a text classification method based on few samples

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A Text Classification Method Based on Few Samples
  • A Text Classification Method Based on Few Samples
  • A Text Classification Method Based on Few Samples

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 2

[0055] On the basis of embodiment 1, this embodiment provides a schematic case:

[0056] At present, there are marked financial-related data as data set a, and the categories included in data set a can be known, that is, there are 9 categories in Table 1 (m=9), and there are a total of 873 data items (n=879 / 9 =97). In actual use, the amount of data included in each type of data is not equal, so n is the average number of data pieces included in each type of data.

[0057]

[0058] Use Chinese-English, Chinese-Japanese, and Chinese-Korean translation tools to translate data set a in Table 1, and obtain data set b=9*97*(3+1)=3492 data, as shown in Table 2 Shown:

[0059]

[0060] After encoding the data set b using the BERT pre-training model corresponding to each translation tool, the vector set V is obtained, and then the vector set V is input into the TextCNN classification model for new connection until the model converges, and the trained model can be used for class...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The present invention relates to a text classification method based on few samples, comprising the following steps: using z translation tools to translate each piece of data in data set a z times to obtain expanded data set b; using BERT pre-training model Encode the expanded data set b to obtain a vector set V; use the vector set V as the training set x, use the label of the data set a as the training set y, and input the training set x and the training set y into the classification model together, The classification model is trained until a converged classification model is obtained. The present invention greatly expands the original few-sample data to increase training samples, but does not increase manual labeling. Therefore, on the one hand, it solves the problem of inaccurate training classification with few samples, and on the other hand, it also avoids the need for manual labeling. manpower and time.

Description

technical field [0001] The invention relates to the technical field of text classification, in particular to a text classification method based on few samples. Background technique [0002] Text classification, or automatic text classification, refers to the process in which a computer maps a text with additional information to a predetermined category or categories of topics. Texts include news, articles, written works, novels, notices, etc. For example, when classifying the text of a piece of news, it can be judged and classified into categories such as sports news, entertainment news, current affairs and political news, or weather forecast; When classifying the text of novels, it can be divided into romance novels, martial arts novels, or suspense novels. It can be seen that text classification also belongs to the process of natural language processing and is a technical application field for processing semantic information. [0003] The mainstream traditional deep lear...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/35G06F40/126G06F40/58
CPCG06F16/35G06F40/126G06F40/58
Inventor 刘世林罗镇权黄艳曾途
Owner 成都数联铭品科技有限公司
Features
  • R&D
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More