Unlock instant, AI-driven research and patent intelligence for your innovation.

Text classification model training method and device, storage medium and computer equipment

A technology for text classification and training methods, applied in text database clustering/classification, computing, unstructured text data retrieval, etc. effect, improve labeling efficiency, and reduce the amount of labeling

Pending Publication Date: 2020-08-11
大箴(杭州)科技有限公司
View PDF6 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

For large-scale text message classification tasks, the characteristics are that the text of text messages is huge, and the templates are various and unevenly distributed every day. If the direct sampling method is used, it will mainly cause two problems, one is the low coverage of the templates, and the other is There is a deviation in the amount of data falling into each category, which causes great difficulties for text annotation and model training
In addition, due to certain differences in the templates of SMS messages every day, the model trained with fixed-day labeled data cannot be generalized to all unseen SMS templates.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Text classification model training method and device, storage medium and computer equipment
  • Text classification model training method and device, storage medium and computer equipment
  • Text classification model training method and device, storage medium and computer equipment

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0060] Hereinafter, the present application will be described in detail with reference to the drawings and embodiments. It should be noted that, in the case of no conflict, the embodiments in the present application and the features in the embodiments can be combined with each other.

[0061] Provide a kind of training method of text classification model in this embodiment, such as figure 1 As shown, the method includes:

[0062] Step 101, clustering the first text samples to obtain at least one first text cluster;

[0063] Step 102, based on a first preset number of first text samples respectively obtained from each first text cluster, to obtain a text label corresponding to each first text cluster;

[0064] Step 103, respectively acquiring a second preset number of first text samples from each first text cluster as first training samples;

[0065] Step 104: Establish a first training set based on the first training samples and their corresponding text labels, and train th...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a text classification model training method, a text classification model training device, a storage medium and computer equipment. The text classification model training methodcomprises the steps of: carrying out clustering on first text samples, and acquiring at least one first text cluster; based on a first preset number of first text samples obtained from each first text cluster, acquiring a text label corresponding to each first text cluster; acquiring a second preset number of first text samples from each first text cluster to serve as first training samples; andestablishing a first training set based on the first training samples and text labels corresponding to the first training samples, and training a text classification model. According to the text classification model training method and the text classification model training device, the texts are clustered, so that the coverage rate of a template is improved while the annotation amount is reduced,the annotation efficiency is greatly improved, and the model effect is further improved.

Description

technical field [0001] The present application relates to the technical field of text classification, in particular to a text classification model training method, device, storage medium and computer equipment. Background technique [0002] The text classification task of natural language processing requires a large amount of labeled text to train the classification model. In the prior art data labeling systems, platforms or methods, the data is sampled and then labeled, and then the labeled data is used to train the model. Combined with the training results, the next round of labeling, training, and tuning is performed. For large-scale text message classification tasks, the characteristics are that the text of text messages is huge, and the templates are various and unevenly distributed every day. If the direct sampling method is used, it will mainly cause two problems, one is the low coverage of the templates, and the other is There is a bias in the amount of data fallin...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F16/35G06F40/279
CPCG06F16/35G06F40/279
Inventor 林连升
Owner 大箴(杭州)科技有限公司