Supercharge Your Innovation With Domain-Expert AI Agents!

Model training method, text classification method, system, equipment and medium

A technology for model training and text classification, applied in the field of deep learning, can solve the problem of inaccurate noise data classification, and achieve the effect of improving accuracy, increasing diversity, and alleviating the problem of data imbalance

Pending Publication Date: 2022-04-08
携程旅游信息技术(上海)有限公司
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] The technical problem to be solved by the present invention is to provide a model training method, text classification method, system, equipment and medium in order to overcome the defects of inaccurate classification and introduction of noise data in the prior art when performing multi-label text classification

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Model training method, text classification method, system, equipment and medium
  • Model training method, text classification method, system, equipment and medium
  • Model training method, text classification method, system, equipment and medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0071] Such as figure 1 As shown, this embodiment provides a model training method, including:

[0072] Step 101, obtaining an initial sample data set, the initial sample data set includes multi-label text;

[0073] Step 102, obtaining samples to be enhanced according to the initial sample data set;

[0074] Step 103, performing text data enhancement processing on the samples to be enhanced to obtain multi-label text training samples;

[0075] Step 104, calculating a new loss function based on the first loss function and the second loss function;

[0076] Step 105, training the neural network model based on multi-label text training samples and a new loss function to obtain a multi-label text classification model;

[0077] In this embodiment, the first loss function is a CE Loss function, and the second loss function is a KL Loss function.

[0078] In this embodiment, the CE Loss function is used to measure the loss of the predicted classification result and the real class...

Embodiment 2

[0101] Such as Figure 4 As shown, the present embodiment provides a model training system, including a first acquisition module 1, a second acquisition module 2, a processing module 3, a calculation module 4 and a training module 5;

[0102] The first acquisition module 1 is used to acquire an initial sample data set, the initial sample data set includes multi-label text;

[0103] The second obtaining module 2 is used to obtain samples to be enhanced according to the initial sample data set;

[0104] The processing module 3 is used to perform text data enhancement processing on the samples to be enhanced to obtain multi-label text training samples;

[0105] Calculation module 4, for calculating and obtaining a new loss function based on the first loss function and the second loss function;

[0106] Training module 5, for training the neural network model based on multi-label text training samples and new loss function, to obtain multi-label text classification model;

[01...

Embodiment 3

[0131] Figure 5 It is a schematic structural diagram of an electronic device provided by Embodiment 3 of the present invention. The electronic device includes a memory, a processor, and a computer program stored on the memory and operable on the processor, and the model training method of Embodiment 1 is implemented when the processor executes the program. Figure 5 The electronic device 30 shown is only an example, and should not limit the functions and scope of use of the embodiments of the present invention.

[0132] Such as Figure 5 As shown, electronic device 30 may take the form of a general-purpose computing device, which may be a server device, for example. Components of the electronic device 30 may include, but are not limited to: at least one processor 31 , at least one memory 32 , and a bus 33 connecting different system components (including the memory 32 and the processor 31 ).

[0133] The bus 33 includes a data bus, an address bus, and a control bus.

[01...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a model training method, a text classification method, a system, equipment and a medium. The method comprises the following steps: acquiring an initial sample data set; obtaining a to-be-enhanced sample according to the initial sample data set; performing text data enhancement processing on the to-be-enhanced sample to obtain a multi-label text training sample; calculating a new loss function based on the first loss function and the second loss function; and training a neural network model based on the multi-label text training sample and the new loss function to obtain a multi-label text classification model. According to the method, the to-be-enhanced samples are subjected to data enhancement processing by adopting multiple different data enhancement algorithms to obtain the multi-label text training samples, the diversity of the data enhancement samples is increased, and the multi-label text classification model is obtained by combining new loss function training, so that the texts are classified by utilizing the multi-label text classification model, and the classification efficiency is improved. The problem of data imbalance in multi-label classification is relieved, and the accuracy of a multi-label text classification model is improved.

Description

technical field [0001] The present invention relates to the technical field of deep learning, in particular to a model training method, text classification method, system, equipment and media. Background technique [0002] In the current multi-label text classification scenario, there is a problem of unbalanced verification data. There are thousands of samples with sufficient label samples, but only a few dozen or even a few samples for some labels. To alleviate the problem of data imbalance. The model level is mainly to modify the loss (loss) function to allow the model to assign more weights to a small number of labels, such as focal loss (focal loss), while the model level has the problem of inaccurate classification; the data level is mainly divided into sampling and data enhancement. Although the data level expands the data set, it also introduces noise samples. Contents of the invention [0003] The technical problem to be solved by the present invention is to prov...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F16/35G06F16/33G06F40/30G06N3/04
Inventor 杨森罗超江小林邹宇
Owner 携程旅游信息技术(上海)有限公司
Features
  • R&D
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More