Unlock instant, AI-driven research and patent intelligence for your innovation.

Data sample expansion method and device and electronic equipment

A technology of data samples and samples, applied in the field of data processing, can solve the problems of poor model accuracy and robustness, high cost, and small quantity.

Pending Publication Date: 2021-05-11
BEIJING DIDI INFINITY TECH & DEV
View PDF0 Cites 3 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0002] In the process of text classification, the model established by machine learning can be used to classify the text, but when training the model, in order to improve the accuracy and robustness of the model, a large number of sample data are required to train the model, so The source of the above sample data comes from manually labeled sample data, but due to the long acquisition period, high cost and small number of manually labeled samples, the accuracy and robustness of the trained model are poor

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Data sample expansion method and device and electronic equipment
  • Data sample expansion method and device and electronic equipment
  • Data sample expansion method and device and electronic equipment

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0030] The present disclosure is described below based on examples, but the present disclosure is not limited only to these examples. In the following detailed description of the present disclosure, some specific details are set forth in detail. It is without the description of these details that those skilled in the art can fully understand the present disclosure. To avoid obscuring the essence of the present disclosure, well-known methods, procedures, procedures, components and circuits have not been described in detail.

[0031] Additionally, those of ordinary skill in the art will appreciate that the drawings provided herein are for illustrative purposes and are not necessarily drawn to scale.

[0032] Unless the context clearly requires, words like "including" and "including" throughout the application documents should be interpreted as an inclusive meaning rather than an exclusive or exhaustive meaning; that is, the meaning of "including but not limited to".

[0033] I...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The embodiment of the invention discloses a data sample expansion method and device, a readable storage medium and electronic equipment. According to the embodiment of the invention, the prediction probability of the non-manually-labeled sample data is determined through the base model, and then the prediction category of each piece of non-manually-labeled sample data is determined according to the prediction probability and the preset probability threshold; as the non-manually-labeled sample data has the pre-labeled sample category, when the sample category of the non-manually-labeled sample data is the same as the prediction category, the non-manually-labeled sample data is reserved, and then the reserved non-manually-labeled sample data and the manually-labeled sample data are merged to generate an expanded sample data set. According to the method, the non-manually-labeled sample data with high confidence is determined according to the prediction category and the sample category, the reserved non-manually-labeled sample data and the manually-labeled sample data are merged, the number of the sample data is increased, then the base model is trained through the expanded sample data, and the accuracy of the base model can be improved.

Description

technical field [0001] The invention relates to the field of data processing, in particular to a method, device and electronic equipment for expanding data samples. Background technique [0002] In the process of text classification, the model established by machine learning can be used to classify the text, but when training the model, in order to improve the accuracy and robustness of the model, a large number of sample data are required to train the model, so The source of the above sample data comes from manually labeled sample data, but due to the long acquisition period, high cost and small quantity of manually labeled samples, the accuracy and robustness of the trained model are poor. [0003] To sum up, how to expand the sample data and improve the accuracy and robustness of the model is a problem that needs to be solved at present. Contents of the invention [0004] In view of this, the embodiments of the present invention provide a data sample expansion method, ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06K9/62G06F16/35G06N20/00
CPCG06F16/355G06N20/00G06F18/2148G06F18/24323G06F18/2415
Inventor 尹从丽
Owner BEIJING DIDI INFINITY TECH & DEV