Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Method for integrating crowdsource annotation data based on task difficulty and annotator ability

A crowdsourced labeling and data integration technology, which is applied in the field of crowdsourced labeling data integration based on task difficulty and the ability of labelers, which can solve problems such as large deviations in labeling results, deviations in labeler evaluations, and deviations in accuracy.

Inactive Publication Date: 2015-04-29
ZHEJIANG UNIV
View PDF2 Cites 44 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Although the above method improves the integration accuracy of labeled data to a certain extent, in the process, the definition of the ability of the labeler is only based on the consistency between all the data marked by the labeler and the final labeling results of each task. to determine
However, the tagging results of tasks obtained through integration are not necessarily correct, which leads to biased assessment of the capabilities of the taggers. Therefore, the final tagging obtained in the above-mentioned integration model based on worker capabilities There is also a large deviation in the accuracy of the results
At the same time, the current labeling data integration model lacks the evaluation method of task difficulty, an important influencing factor, and then ignores the important role of task difficulty in the entire process of labeling data integration, resulting in a large deviation in the final labeling results.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for integrating crowdsource annotation data based on task difficulty and annotator ability
  • Method for integrating crowdsource annotation data based on task difficulty and annotator ability
  • Method for integrating crowdsource annotation data based on task difficulty and annotator ability

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0040] In order to describe the present invention more specifically, the technical solutions of the present invention will be described in detail below in conjunction with the accompanying drawings and specific embodiments.

[0041] The flow process of the inventive method is as figure 1 As shown, it specifically includes the following steps:

[0042] Step (1): Evaluation of task difficulty The evaluation of task difficulty is based on the collection of labeled data sets Find the difficulty set of all tasks {D i |i∈[1,a]}; where is the tagging result of the i-th task by the w-th tagger, D i Indicates the difficulty of the i-th task, a is the total number of tasks, and W is the total number of annotators. The method is described below taking the difficulty of the i-th task as an example, and the steps are as follows:

[0043] 1-1: Collect the collected annotation data Perform statistics to obtain the number K of the types of labeling results made by all labelers for the...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a method for integrating crowdsource annotation data based on task difficulty and annotator ability. The method involves the following two phenomena: (1) the annotation results of most of the tasks annotated by an annotator with relatively high ability is the same as those of other annotators; (2) the consistency of the annotation results of the tasks with relatively low difficulty and annotated by the annotators is high. According to the method, a novel evaluation method for the task difficulty and an evaluation method for the annotator ability are provided; an integrating method based on the two methods for the crowdsource annotation data is created; the iteration mode is utilized to fast solve; therefore, the ability evaluation of the annotator can be objective and accurate; the difficulty of various crowdsource annotation tasks can be effectively evaluated conveniently; meanwhile, the method is applicable to crowdsource annotation data of various models, including but not being limited to two-value annotation and multi-value annotation of images, texts, videos and other tasks.

Description

technical field [0001] The invention belongs to the technical field of data labeling, and in particular relates to a crowdsourcing labeling data integration method based on task difficulty and labeler's ability. Background technique [0002] High-quality labeled datasets are very important resources in the field of computer research and applications. Algorithms in the fields of computer vision, artificial intelligence, and machine learning are mostly trained and optimized based on corresponding labeled data sets. Obtaining high-quality and large-scale labeled datasets quickly and efficiently has always been a concern of various researchers. The traditional way to obtain labeled datasets is to hire experts to manually label the datasets. The annotation data obtained in this way is of high quality, but the annotation takes a long time, and the financial cost of hiring experts is also very large. [0003] In recent years, with the development of crowdsourcing technology, the...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F19/00
Inventor 王东辉孙欢李亚南蔺越檀熊逵黄鹏程洪高峰徐灿梁建增庄越挺
Owner ZHEJIANG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products