A Quality Control Method for Crowdsourcing Classification Data Based on Self-paced Learning

A quality control method and classification data technology, applied in the field of crowdsourcing classification data quality control based on self-paced learning, can solve problems such as providing errors, random provision, uselessness, etc., to reduce the expenditure of crowdsourcing tasks and achieve high accuracy Effect

Active Publication Date: 2020-08-14
DALIAN UNIV OF TECH
View PDF4 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Due to the openness of crowdsourcing tasks, there may be some malicious workers who deliberately provide wrong information or randomly provide information, or workers with insufficient capabilities provide useless information, so the work quality evaluation of workers and crowdsourcing quality control are a important issues

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A Quality Control Method for Crowdsourcing Classification Data Based on Self-paced Learning
  • A Quality Control Method for Crowdsourcing Classification Data Based on Self-paced Learning
  • A Quality Control Method for Crowdsourcing Classification Data Based on Self-paced Learning

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0024] The self-paced learning crowdsourcing classification data quality control method is generally divided into two parts. The first part is the data collection stage, which allows labelers to arbitrarily select topics for labeling. For the marked objects that the worker does not want to label or is not sure about, the worker can choose to skip. There is also no limit on the number of worker annotations, so the resulting annotation data may be very unbalanced and sparse. The second part is the discovery of real labels. This part is iteratively executed by selecting labels and estimating hidden real labels to obtain more accurate real labels and the real capabilities of workers.

[0025] (1) Data collection stage

[0026] In the data collection phase, the Figure 1 The method of user interaction, when marked by this method, the user can skip the question, and there is no limit to the user's answer to the least question, so that the user can answer at any time and stop at an...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a crowdsourcing classification data quality control method based on self-learning and belongs to the field of computer scientific data mining technology. The method is used for true classification discovery of multiple classification crowdsourcing annotation tasks and recognition of a malicious worker. According to the method, first, sample credibility is calculated according to initial dataset nature; second, a sample is selected; third, a true tag and the ability of a worker are calculated; fourth, another sample is selected according to updated ability and the true tag; fifth, after all sample points are completely selected, further optimization is performed; and finally an annotated true answer and recognition results of the ability of the worker and a malicious and passive worker are acquired at the same time. Experiments prove that a better result can be obtained through the method compared with a traditional method.

Description

technical field [0001] The invention belongs to the technical field of computer science data mining, and relates to a method for controlling the quality of crowdsourcing classified data based on self-paced learning. Background technique [0002] Crowdsourcing (also known as human computing, crowd wisdom) means that companies and enterprises outsource task distribution to uncertain (generally a large number of) people in an open manner. It is believed that the "wisdom of the majority" is far more accurate than individual judgment. A large number of crowdsourcing platforms distribute tasks to registered workers, and then pay corresponding wages according to the marked data. The data obtained by crowdsourcing will be applied to a large number of data mining, machine learning, and deep learning tasks, so the quality of the data obtained by crowdsourcing data will seriously affect the results of subsequent learning tasks. In the crowdsourcing distribution system, algorithms for...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F17/18G06Q10/06
CPCG06F17/18G06Q10/063112G06Q10/06395
Inventor 张宪超史珩梁文新刘馨月
Owner DALIAN UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products