Artificial intelligence data labeling method and device

An artificial intelligence and data technology, applied in the field of data processing, can solve the problems that the scale of annotation is difficult to keep consistent, the subjective influence of annotators and reviewers is large, and the accuracy is not high. The effect of labeling errors

Active Publication Date: 2019-09-20
CHINA ACADEMY OF INFORMATION & COMM
View PDF10 Cites 6 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] High labor costs for data labeling: AI algorithm training requires a large number of labeled samples, and the current massive data labeling tasks rely on manual methods. "As much as there is labor, there is as much intelligence", resulting in high costs for making data sets;
[0005] The quality of data labeling is difficult to guarantee: labeling tasks are subject to the subjective influence of labelers and reviewers, which will introduce ce

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Artificial intelligence data labeling method and device
  • Artificial intelligence data labeling method and device
  • Artificial intelligence data labeling method and device

Examples

Experimental program
Comparison scheme
Effect test

Example Embodiment

[0065] Example one

[0066] See figure 2 , figure 2 This is a schematic diagram of the artificial intelligence data labeling process in the embodiment of this application. The specific steps are:

[0067] Step 201: Obtain a data set to be labeled.

[0068] Step 202: Obtain the AI ​​tag with the highest probability score of each piece of data to be labeled and the corresponding probability score based on the established AI model.

[0069] In specific implementation, one or more established AI models can also be used to obtain the AI ​​tag with the highest probability score for each piece of data to be labeled, and the corresponding probability score.

[0070] Taking M AI models as an example, based on the established AI model, the AI ​​tag with the highest probability score for each piece of data to be labeled and the probability score are obtained, including:

[0071] For the data to be labeled, the probability score corresponding to each AI tag corresponding to the model is obtained ...

Example Embodiment

[0088] Example two

[0089] See image 3 , image 3 In this embodiment of the present application, the data annotated by the AI ​​model is used as a schematic flow chart of the data sample for training the AI ​​model. The specific steps are:

[0090] Step 301: Obtain a data set to be labeled.

[0091] Step 302: Obtain the AI ​​tag with the highest probability score of each piece of data to be labeled and the corresponding probability score based on the established AI model.

[0092] Step 303: For any data to be labeled, determine whether the probability score is greater than a first preset threshold.

[0093] Step 304: When it is determined that the probability score is greater than the first preset threshold, and it is determined to randomly check the data to be labeled, an artificial label is labeled for the data to be labeled.

[0094] In step 305, it is determined whether the artificial tag is consistent with the obtained AI tag, if so, step 309 is executed; otherwise, step 308 is e...

Example Embodiment

[0102] Example three

[0103] See Figure 4 , Figure 4 This is a schematic diagram of the process of determining whether to update the first threshold according to the accuracy rate in an embodiment of this application. The specific steps are:

[0104] Step 401: Obtain a data set to be labeled.

[0105] Step 402: Obtain the AI ​​tag with the highest probability score of each piece of data to be labeled and the corresponding probability score based on the established AI model.

[0106] Step 403: For any data to be labeled, determine whether the probability score is greater than a first preset threshold.

[0107] Step 404: When it is determined that the probability score is greater than the first preset threshold, and it is determined that the data to be labeled is randomly checked, an artificial label is labeled for the data to be labeled; and whether the artificial label for the data is consistent with the obtained AI label, Step 406 is executed.

[0108] Step 405: When it is determin...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides an artificial intelligence data labeling method and device. The method comprises: acquiring a to-be-labeled data set; obtaining an AI label with the highest probability score of each piece of to-be-labeled data and a probability score based on the established AI model; for any to-be-labeled data, determining whether the probability score is greater than a first preset threshold; when it is determined that the probability score is larger than a first preset threshold value and sampling inspection is carried out on the to-be-labeled data, or when it is determined that the probability score is not larger than the first preset threshold value, labeling an artificial label on the to-be-labeled data; and when it is determined that the probability score is greater than a first preset threshold and it is determined that the to-be-labeled data is not sampled, labeling the to-be-labeled data by using the acquired AI label with the highest probability score. According to the method, the manual marking cost and the implementation time cost are saved, and marking errors caused by human subjective factors and marking personnel technical backgrounds are reduced.

Description

technical field [0001] The invention relates to the technical field of data processing, in particular to an artificial intelligence data labeling method and device. Background technique [0002] With the rapid development of technologies such as the Internet, machine learning, big data, and cloud computing, all kinds of information data continue to grow at an exponential rate. In the context of the big data era, artificial intelligence has already empowered multiple industries relying on massive data , Breeding a variety of industry applications. [0003] At present, most of the machine learning and deep learning algorithms that artificial intelligence relies on are data-dependent, requiring a large amount of data to train algorithms in a supervised or semi-supervised manner for customized deployment. Due to the huge volume of big data in my country, the complex data types and high data dimensions of various industries pose a huge challenge to the data labeling task. In ge...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06N20/00
CPCG06N20/00
Inventor 吕博
Owner CHINA ACADEMY OF INFORMATION & COMM
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products