A semi-supervised crowdsourced labeling data integration method for label imbalance

A label imbalance and crowdsourcing labeling technology, which is applied in the field of label imbalance-oriented semi-supervised crowdsourcing labeling data integration, can solve problems such as label imbalance, different weights, and weight differences, and achieve the evaluation of weighted parameters Objective and accurate, accurate labeling results, and objective and accurate performance of ability assessment

Inactive Publication Date: 2019-01-15
ZHEJIANG UNIV
View PDF3 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Although the above algorithm improves the integration accuracy of labeled data to a certain extent, in the process, when considering the final result integration, it is assumed that each label type has the same probability of being labeled.
However, in actual labeling, the types of tags are often unbalanced, and there are certain weight differences between them. It is necessary to introduce weighted parameters to rebalance the relationship between tags.
At the same time, the weights in different labeling tasks must be different, which can only be trained in actual tasks and cannot be pre-set

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A semi-supervised crowdsourced labeling data integration method for label imbalance
  • A semi-supervised crowdsourced labeling data integration method for label imbalance
  • A semi-supervised crowdsourced labeling data integration method for label imbalance

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0045] In order to describe the present invention more specifically, the technical solutions of the present invention will be described in detail below in conjunction with the accompanying drawings and specific embodiments.

[0046] The flow process of the inventive method is as figure 1 As shown, it specifically includes the following steps:

[0047] Step (1): The evaluation of weighted parameters is obtained from the set of correct results and the corresponding crowdsourced annotation results Find the weighted parameter set {w j |j∈[1,C]} (that is, W), where is the correct label for the mth object in the correct result, Indicates the number of times the kth worker marked the result as j on the mth object, which is the training set corresponding to the correct result, w j Indicates the weight corresponding to the jth tag, M is the total number of objects, C is the number of tag types, and K is the total number of taggers. The method is introduced below, and the step...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a semi-supervised crowdsourcing labeling data integration method for label imbalance, which is based on the following two phenomena: (1) the labeling accuracy rate of the labeler has nothing to do with the object; The weights considered in different labeling tasks of the object are the same; a new evaluation method of weighted parameters and the evaluation method of the labeler's ability are proposed, and a semi-supervised crowdsourcing labeling data integration method for label imbalance is constructed, using an iterative method Solve, so that the evaluation of weighted parameters and the ability of the labeler is more objective and accurate, and the integrated labeling results are more accurate; at the same time, the present invention is applicable to various types of crowdsourcing labeling data, including but not limited to: data such as images, texts, and videos form of multi-category labeling, etc.

Description

technical field [0001] The invention belongs to the technical field of data labeling, and in particular relates to a semi-supervised crowdsourcing labeling data integration method oriented to label imbalance. The method comprehensively considers weighting parameters and labeler capabilities. Background technique [0002] With the advent of the big data era, extracting knowledge from big data is the most important research direction in the computer field today, attracting the attention of fields such as artificial intelligence and machine learning. Methods such as machine learning rely on high-quality labeled datasets for algorithm and model training. Therefore, it is of great significance to quickly and efficiently construct high-quality data sets. In the past, the construction of datasets mainly relied on expert labeling, through the way of hiring, after a period of high-intensity work to manually label the data. This method has the characteristics of high quality, high c...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06K9/62
CPCG06F18/217G06F18/214
Inventor 王东辉洪高峰李亚楠蔺越檀庄越挺
Owner ZHEJIANG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products