Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Semi-monitoring crowdsourcing marking data integration method facing imbalance of labels

A label imbalance and crowdsourced labeling technology, applied in the field of label imbalance-oriented semi-supervised crowdsourced labeling data integration, which can solve the problems of label type imbalance, different weights, and weight differences.

Inactive Publication Date: 2016-07-20
ZHEJIANG UNIV
View PDF3 Cites 9 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Although the above algorithm improves the integration accuracy of labeled data to a certain extent, in the process, when considering the final result integration, it is assumed that each label type has the same probability of being labeled.
However, in actual labeling, the types of tags are often unbalanced, and there are certain weight differences between them. It is necessary to introduce weighted parameters to rebalance the relationship between tags.
At the same time, the weights in different labeling tasks must be different, which can only be trained in actual tasks and cannot be pre-set

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Semi-monitoring crowdsourcing marking data integration method facing imbalance of labels
  • Semi-monitoring crowdsourcing marking data integration method facing imbalance of labels
  • Semi-monitoring crowdsourcing marking data integration method facing imbalance of labels

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0045] In order to describe the present invention more specifically, the technical solutions of the present invention will be described in detail below in conjunction with the accompanying drawings and specific embodiments.

[0046] The flow process of the inventive method is as figure 1 As shown, it specifically includes the following steps:

[0047] Step (1): The evaluation of weighted parameters is obtained from the set of correct results and the corresponding crowdsourced annotation results Find the weighted parameter set {w j |j∈[1,C]} (that is, W), where is the correct label for the mth object in the correct result, Indicates the number of times the kth worker marked the result as j on the mth object, which is the training set corresponding to the correct result, w j Indicates the weight corresponding to the jth tag, M is the total number of objects, C is the number of tag types, and K is the total number of taggers. The method is introduced below, and the step...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a semi-monitoring crowdsourcing marking data integration method facing imbalance of labels. According to the phenomena that (1) the label marking accuracy is not related to objects and (2) the weights of different marking tasks of the same object are the same, a new weighted parameter evaluation method and a marker capability evaluation methods are provided, and the semi-monitoring crowdsourcing marking data integration method facing imbalance of labels. An iteration manner is used for solving, so that estimation for the weighted parameters and the marker capability is more objective and accurate, and an integrated marking result is more accurate; and the method is suitable for different types of crowdsourcing marking data including but not limited to images, texts and videos.

Description

technical field [0001] The invention belongs to the technical field of data labeling, and in particular relates to a semi-supervised crowdsourcing labeling data integration method oriented to label imbalance. The method comprehensively considers weighting parameters and labeler capabilities. Background technique [0002] With the advent of the big data era, extracting knowledge from big data is the most important research direction in the computer field today, attracting the attention of fields such as artificial intelligence and machine learning. Methods such as machine learning rely on high-quality labeled datasets for algorithm and model training. Therefore, it is of great significance to quickly and efficiently construct high-quality data sets. In the past, the construction of datasets mainly relied on expert labeling, through the way of hiring, after a period of high-intensity work to manually label the data. This method has the characteristics of high quality, high c...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06K9/62
CPCG06F18/217G06F18/214
Inventor 王东辉洪高峰李亚楠蔺越檀庄越挺
Owner ZHEJIANG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products