Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Method and system for assisting in labeling model training data

A technology for training data and labeling models, which is applied in the field of auxiliary labeling model training data to improve labeling efficiency and labeling quality, reduce labeling costs, and improve labeling quality.

Pending Publication Date: 2022-04-12
南京星云数字技术有限公司
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Although we can invest more human resources to alleviate this problem, the labeling of data by a large number of ordinary people with different identities, different ideas and no professional training is bound to give a negative result to the labeling results (that is, some of the above Labeled data) brings a lot of "noise", and these "noises" will have a non-negligible impact on the training process of the subsequent trained algorithm model

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and system for assisting in labeling model training data
  • Method and system for assisting in labeling model training data
  • Method and system for assisting in labeling model training data

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0058] see figure 1 , the embodiment of the present invention discloses a method for assisting labeling model training data, including:

[0059] S1. Construct two training sets, wherein the training set includes a plurality of training data, and the training data is obtained after being sampled from the data pool to be labeled and labeled by the operator;

[0060] S2. Train two classifiers that correspond one-to-one to the training set, and use the two classifiers to predict the category of the training data in the training set of the other party respectively, and obtain the wrong data that the prediction result is inconsistent with the category marked by the operator, and at the same time obtain the undefined category. training data, and extracting wrong features from wrong data;

[0061] S3. Use two classifiers to predict the data pool to be labeled at the same time, filter and retain the contradictory data whose prediction results of the two classifiers are inconsistent, a...

Embodiment 2

[0104] This embodiment provides a system for assisting in labeling model training data, including a sampling module, an error identification module, a data pool refresh module, a data export module and a data verification module, wherein,

[0105] The sampling module is used to construct two training sets, and to obtain multiple new training data based on the new data pool to be labeled after the data pool to be labeled is updated and distribute them to the two training sets, that is, in the iterative process, based on the new construct a new training set from the labeling pool; wherein, the training set includes a plurality of training data, and the training data is obtained by sampling from the data pool to be labeled and labeled by the operator;

[0106] The error identification module is used to train two classifiers that correspond to the training set one by one, and use the two classifiers to predict the category of the training data in the training set of the other party...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a method and a system for assisting in labeling model training data. The method comprises the following steps: S1, constructing two training sets; s2, training two classifiers which are in one-to-one correspondence with the training sets, respectively predicting categories of training data in the training sets of the opposite side through the two classifiers, obtaining error data of which prediction results are inconsistent with categories labeled by operators, and extracting error features from the error data; s3, updating the to-be-labeled data pool, obtaining a plurality of new training data based on the new to-be-labeled data pool, and distributing the new training data to the two training sets; and S4, repeating the steps S2-S3 based on the two new training sets until the total number of the training data in the two training sets reaches a preset value, and exporting all the training data in the two training sets. The system for assisting in labeling the model training data adopts the method, some valuable model training data are screened out, and the labeling efficiency and labeling quality are improved in the labeling process of the model training data at the same time.

Description

technical field [0001] The present invention relates to the technical field of artificial intelligence, in particular to a method and system for assisting in labeling model training data. Background technique [0002] Benefiting from the rapid development of computing resources and the massive data brought by the popularization of the Internet, artificial intelligence technology has developed rapidly. Among them, supervised learning plays an important role. Supervised learning relies on a large amount of labeled data, and the efficiency of data labeling is low. These two characteristics have long hindered the development and application of AI technology. Although we can invest more human resources to alleviate this problem, the labeling of data by a large number of ordinary people with different identities, different ideas and no professional training is bound to give a negative result to the labeling results (that is, some of the above Labeled data) brings a lot of "noise...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06K9/62G06N3/04G06N3/08G06F40/289G06F40/242
Inventor 谢铁
Owner 南京星云数字技术有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products