Method for integrating crowdsourced annotations

A crowdsourced labeling and project technology, applied in the field of data mining and machine learning, which can solve the problems of not considering the tendency, far-fetched, and different of the labelers

Active Publication Date: 2016-05-25
BEIJING REALAI TECH CO LTD
View PDF8 Cites 21 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Compared with the majority voting model, this algorithm adds a lot of details. It has a preliminary assumption about the source of the labeler's error, and gives a more rigorous problem statement through the method of probability. However, there are many far-fetched assumptions in this processing method: first, under this model, the probability that an annotator marks a picture of one category as another category is a certain value, but as the pictu

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for integrating crowdsourced annotations
  • Method for integrating crowdsourced annotations
  • Method for integrating crowdsourced annotations

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0048] The present invention will be described in further detail below in conjunction with the accompanying drawings and embodiments. The following examples are used to illustrate the present invention, but should not be used to limit the scope of the present invention.

[0049] A crowdsourcing annotation integration method, such as figure 1 As shown, the method includes the following steps:

[0050] S1. Set the hyperparameters of confusion matrix, interval distance and regularization hyperparameters;

[0051] S2. Initialize the voting weight of the annotator, and use the majority voting method to set the initial value for the to-be-estimated annotation of all prediction items;

[0052] S3. According to the initial values ​​of all prediction items obtained in the step S2 or the estimated values ​​obtained in the previous round of iterations, that is, the updated values, count the number of times each annotator marks each prediction item as each predetermined category, wherei...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The present invention provides a method for integrating crowdsourced annotations. According to the method provided by the present invention, a generalized anti-Gaussian distribution is defined by using a regularization hyperparameter, a spacing distance hyperparameter, an annotator voting weight, and a difference between the number of times that annotators annotate a current prediction item as a corresponding estimate and the number of times that annotators annotate the current prediction item as a subcategory; sampling is performed to obtain an auxiliary parameter; and annotator weights are updated by using the auxiliary parameter, thereby significantly enhancing a discrimination capability of a model. Then, a traditional annotation integration majority voting model and a confusion matrix model are integrated, thereby achieving an objective of more comprehensively describing a data generation process. In addition, an accurate prediction item update value is obtained through sampling, and moreover, running efficiency is also improved.

Description

technical field [0001] The invention belongs to the technical field of data mining and machine learning, and more specifically relates to a crowdsourcing labeling integration method. Background technique [0002] With the explosive growth of Internet data and the widespread application of statistical machine learning algorithms, the role of large-scale labeled datasets in machine learning has begun to emerge. At the same time, the method of using crowdsourcing to obtain data annotations is becoming more and more important. Crowdsourcing refers to dividing the workload into a large number of simple subtasks, and then assigning them to a large number of ordinary netizens through the network platform. This mode is currently widely used in scenarios where large-scale datasets such as ImageNet are collected and labeled. Compared with the traditional labeling method, data labeling through crowdsourcing is completed by a large number of ordinary netizens at the same time, which ha...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F19/00
CPCG16Z99/00
Inventor 朱军田天
Owner BEIJING REALAI TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products