Unlock instant, AI-driven research and patent intelligence for your innovation.

Integrated weighted majority soft voting crowdsourcing data truth value reasoning method

A reasoning method and soft voting technology, applied to instruments, character and pattern recognition, computer components, etc., can solve problems such as not considering the quality of instance feature labeling, and achieve strong implementability

Pending Publication Date: 2021-07-20
HANGZHOU DIANZI UNIV
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, these truth-value inferences do not consider the characteristics of the instance and the impact of different workers on the annotation quality of different instances.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Integrated weighted majority soft voting crowdsourcing data truth value reasoning method
  • Integrated weighted majority soft voting crowdsourcing data truth value reasoning method
  • Integrated weighted majority soft voting crowdsourcing data truth value reasoning method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0055] see figure 1 and figure 2 , the present invention provides a flow chart of an integrated weighted majority soft voting crowdsourcing data truth reasoning method. The specific process is described in detail below.

[0056] Step 1. Convert to a new crowdsourced dataset by calculating the probability that an instance belongs to each category, then copying K-1 copies of the instance and associating the instance with a different category label k=1,2,3,...,K is used to train weak classifiers. The method removes the influence of speculative aggregated labels, improves classification accuracy, and positively affects the performance of ground-truth inference;

[0057] Step 2, using a method based on maximum likelihood estimation to aggregate weak classifiers;

[0058] Step 2.1 Obtain the confusion matrix set Π of all weak classifiers according to the statistics of step 1;

[0059] Step 2.2 obtains the new classifier prediction label according to the maximum likelihood est...

Embodiment 2

[0064] The difference from Example 1 is that we also need to consider each worker's ability to label different instances, and obtain predicted labels through the soft voting method based on worker weights, see image 3 ,Specific steps are as follows:

[0065] Step 3. Introduce different labeling capabilities of workers on different instances, and use a method based on similarity comparison to calculate worker weights;

[0066] Step 3.1 Calculate the overall quality of the worker by comparing the similarity between the worker label and the strong classification prediction label, and the related formula is as follows:

[0067]

[0068] where f(x i ) is the classifier according to the feature vector x i The predicted class label, τ j Indicates the overall quality of the jth worker, and I indicates the total number of instances;

[0069] Step 3.2 obtains the specific labeling quality of the worker by comparing the labels of the workers. If two workers have the same labeling...

Embodiment 3

[0077] The algorithm model of the integrated weighted majority soft voting crowdsourcing data truth reasoning method is as follows: Figure 4 As shown, the main steps of the algorithm are described in detail.

[0078] Input: D: crowdsourced dataset,

[0079] M: the number of weak classifiers;

[0080] output: Aggregated tags;

[0081] 1. Load the crowdsourced data set D, and divide the data set D into a training set D in a certain proportion T with test set D L ; Use resampling for each weak classifier to D T Sampling to generate subdatasets

[0082] 2. Calculate The proportion of positive and negative categories in Pr, the conversion data set is And train the weak classifier h i (x);

[0083] 3. Aggregating weak classifiers based on maximum likelihood estimation;

[0084] 4. Compare l ij with H(x i ) predicted labels get:

[0085]

[0086] 5. Compare the similarity of each worker

[0087] 6. Combination τ j ,s ij Get the reliability of the jth worke...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses an integrated weighted majority soft voting crowdsourcing data truth value reasoning method. The method comprises the following steps: 1, copying K-1 instance copies by calculating the probability that instances belong to each category so as to convert the instances into a new crowdsourcing data set for training a weak classifier; 2, aggregating the weak classifiers by adopting a method based on maximum likelihood estimation; step 3, introducing different labeling capabilities of workers on different instances, and calculating the weights of the workers by adopting a similarity comparison-based method; and step 4, aggregating and generating reasoning labels by adopting a weighted soft voting method. According to the invention, not only are the characteristics of the examples introduced, but also the labeling capabilities of different workers for different examples are comprehensively considered, and the labeling capabilities are quantified through weights obtained based on similarity comparison prediction labels and worker labels. A weighted soft voting method based on worker weight is proposed to predict a final label. The method provided by the invention has relatively high implementability.

Description

technical field [0001] The invention belongs to the field of data mining, and in particular relates to an integrated weighted majority soft voting crowdsourcing data truth reasoning method. Background technique [0002] The field of data mining requires a large amount of high-quality labeled data to train models, and crowdsourcing labeling is a relatively effective and economical way to obtain labeled data. The crowdsourcing platform divides the task into smaller task units and assigns the online public to perform labeling, thereby obtaining a large amount of labeling data. Due to the uncertainty of the overall quality of the platform labeling personnel, the overall quality of crowdsourced labeling data is lower than that of expert labeling. In order to solve the quality problem of crowdsourced labeling data, the real label is generally inferred by the method of truth value reasoning. In the data annotation in the crowdsourcing system, due to the different annotation levels...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06K9/62
CPCG06F18/22G06F18/2415
Inventor 张桦徐宏沈菲蒋世豪张灵均吴以凡
Owner HANGZHOU DIANZI UNIV