A Quality Control Method for Crowdsourcing Classification Data Based on Self-paced Learning

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A quality control method and classification data technology, applied in the field of crowdsourcing classification data quality control based on self-paced learning, can solve problems such as providing errors, random provision, uselessness, etc., to reduce the expenditure of crowdsourcing tasks and achieve high accuracy Effect

Active Publication Date: 2020-08-14

DALIAN UNIV OF TECH

View PDF4 Cites 0 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

Due to the openness of crowdsourcing tasks, there may be some malicious workers who deliberately provide wrong information or randomly provide information, or workers with insufficient capabilities provide useless information, so the work quality evaluation of workers and crowdsourcing quality control are a important issues

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment

[0024] The self-paced learning crowdsourcing classification data quality control method is generally divided into two parts. The first part is the data collection stage, which allows labelers to arbitrarily select topics for labeling. For the marked objects that the worker does not want to label or is not sure about, the worker can choose to skip. There is also no limit on the number of worker annotations, so the resulting annotation data may be very unbalanced and sparse. The second part is the discovery of real labels. This part is iteratively executed by selecting labels and estimating hidden real labels to obtain more accurate real labels and the real capabilities of workers.

[0025] (1) Data collection stage

[0026] In the data collection phase, the Figure 1 The method of user interaction, when marked by this method, the user can skip the question, and there is no limit to the user's answer to the least question, so that the user can answer at any time and stop at an...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a crowdsourcing classification data quality control method based on self-learning and belongs to the field of computer scientific data mining technology. The method is used for true classification discovery of multiple classification crowdsourcing annotation tasks and recognition of a malicious worker. According to the method, first, sample credibility is calculated according to initial dataset nature; second, a sample is selected; third, a true tag and the ability of a worker are calculated; fourth, another sample is selected according to updated ability and the true tag; fifth, after all sample points are completely selected, further optimization is performed; and finally an annotated true answer and recognition results of the ability of the worker and a malicious and passive worker are acquired at the same time. Experiments prove that a better result can be obtained through the method compared with a traditional method.

Description

technical field [0001] The invention belongs to the technical field of computer science data mining, and relates to a method for controlling the quality of crowdsourcing classified data based on self-paced learning. Background technique [0002] Crowdsourcing (also known as human computing, crowd wisdom) means that companies and enterprises outsource task distribution to uncertain (generally a large number of) people in an open manner. It is believed that the "wisdom of the majority" is far more accurate than individual judgment. A large number of crowdsourcing platforms distribute tasks to registered workers, and then pay corresponding wages according to the marked data. The data obtained by crowdsourcing will be applied to a large number of data mining, machine learning, and deep learning tasks, so the quality of the data obtained by crowdsourcing data will seriously affect the results of subsequent learning tasks. In the crowdsourcing distribution system, algorithms for...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Patents(China)

IPC IPC(8): G06F17/18G06Q10/06

CPCG06F17/18G06Q10/063112G06Q10/06395

Inventor 张宪超史珩梁文新刘馨月

Owner DALIAN UNIV OF TECH

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

A Quality Control Method for Crowdsourcing Classification Data Based on Self-paced Learning

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology