Data labeling method and device

A technology for data and labeling data, which is applied in electrical digital data processing, special data processing applications, digital data information retrieval, etc., and can solve the problems of low labeling accuracy and long time consumption.

Active Publication Date: 2021-03-05
南京奇元科技有限公司
View PDF5 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] One of the existing data labeling methods usually uses manual labeling. However, this manual labeling method takes a long time and is easily affected by the subjective factors of the labeler, resulting in low labeling accuracy.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Data labeling method and device
  • Data labeling method and device
  • Data labeling method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0056] figure 1 A schematic flowchart of a data labeling method provided by the first embodiment of the present invention is shown.

[0057] The data labeling method includes the following steps:

[0058] In step S110, each data in the dataset to be labeled is input into K labeling models, and K labels are obtained for each data.

[0059] Specifically, the K labeled models are obtained by training K sub-training sets, and the K sub-training sets are obtained by performing K random sampling with replacement on samples in the total training set, and K is an integer greater than 1.

[0060] Since K labeling models are trained in this embodiment, if the sampling is not replaced, the similarity between the labeling models will be relatively small, and each labeling model will run independently, and the verification between the K labeling models The results of the comparison are relatively poor, and the generalization ability of each annotation model is poor, so the final annotati...

Embodiment 2

[0211] Figure 5 A schematic structural diagram of a data tagging device provided by the second embodiment of the present invention is shown. The data labeling device 500 corresponds to the data labeling method in Embodiment 1, and the data labeling method in Embodiment 1 is also applicable to the data labeling device 500 , and details are not repeated here.

[0212] The data labeling device 500 includes an input module 510 , a sample determination module 520 and a labeling module 530 .

[0213] The input module 510 is used to input each data in the data set to be labeled into K labeling models respectively, and K labels are obtained for each data, wherein the K labeling models are respectively obtained by training K sub-training sets, so The K sub-training sets are obtained by performing K random sampling with replacement on samples in the total training set, and K is an integer greater than 1.

[0214] The sample determination module 520 is configured to classify the label...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a data labeling method and device, and the method comprises the steps: inputting each piece of data in a to-be-labeled data set into K labeling models, and obtaining K labels for each piece of data, wherein the K labeling models are obtained through the training of K sub-training sets, the K sub-training sets are obtained by performing K times of random sampling with replacement on samples in a total training set, and K is an integer greater than 1; dividing the data corresponding to the label into samples with different confusion degrees based on the confidence coefficient of the label, the confidence coefficient being the consistency degree of K labels obtained for each piece of data; and in a preset stage, sequentially labeling the samples with different confusion degrees to obtain a label of each piece of data in a to-be-labeled data set. According to the technical scheme, the samples with different confusion degrees are compared and verified through the K trained labeling models, so that the samples with different confusion degrees are automatically labeled, and the labor and time cost is greatly saved.

Description

technical field [0001] The present invention relates to the technical field of artificial intelligence, in particular, to a data labeling method and device. Background technique [0002] With the rapid development of science and technology, artificial intelligence has become one of the focuses of people's attention. With the support of technological advancements such as big data, artificial intelligence has shown fruitful results in data analysis, image recognition, smart home, autonomous driving and other fields. Driven by massive amounts of data and centered on deep learning algorithms, artificial intelligence technology enables machines to initially possess the basic visual and auditory abilities of humans, and may be competent for relatively complex mental work. Due to the demand for a large amount of data in the deep learning algorithm, the labeling of massive data has become an urgent demand in the market. [0003] One of the existing data labeling methods usually us...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/2457
CPCG06F16/24573
Inventor 程会云史明王西颖
Owner 南京奇元科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products