Unlock instant, AI-driven research and patent intelligence for your innovation.

Sample labeling method based on crowdsourcing mode

A technology of sample labeling and crowdsourcing mode, applied in the field of data mining, it can solve the problems of not considering group intelligence and different labeling accuracy.

Pending Publication Date: 2021-02-09
WUHAN UNIV
View PDF2 Cites 4 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, the existing traditional crowdsourcing labeling methods do not consider group intelligence. In fact, each individual has different labeling accuracy for different types of instances.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Sample labeling method based on crowdsourcing mode
  • Sample labeling method based on crowdsourcing mode
  • Sample labeling method based on crowdsourcing mode

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0035] In order to make the object, technical solution and advantages of the present invention more clear, the present invention will be further described in detail below in conjunction with the examples. It should be understood that the specific embodiments described here are only used to explain the present invention, not to limit the present invention.

[0036] Aiming at the problem of active learning of high-dimensional multivariate time series, the present invention proposes a high-confidence and low-cost crowdsourcing labeling strategy. After selecting the samples that need to be labeled, the set of labelers whose confidence meets the threshold and the lowest cost is obtained through the adaptive labeler selection algorithm, so as to achieve reliable and low-cost labeling of unlabeled samples. The applicable objects of the present invention generally include industrial sensor data, financial data, medical data (such as electrocardiogram data sets), server system monitori...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a sample labeling method based on a crowdsourcing mode, which comprises the following steps of: 1) obtaining an unannotated multivariate time sequence data set, and extractingsamples required to be annotated in a selected data set; 2) for all annotators, based on the annotation accuracy and annotation cost of each annotator, selecting an annotator set which reaches a confidence threshold and has the lowest cost as a cost benefit crowdsourcing annotation model; 3) for the extracted samples needing to be labeled, obtaining a labeling result based on a cost benefit crowdsourcing labeling model, adding the labeled samples into a labeling data set, classifying reverse nearest neighbor samples of the labeled samples into the same class as the unlabeled samples, and adding the labeled data set to obtain an updated labeling data set; and 4) calculating a stop standard, and obtaining a final mark data set result corresponding to the multivariate time sequence data set after a stop condition is reached. The method provided by the invention is used for realizing reliable and low-cost labeling of unlabeled samples.

Description

technical field [0001] The invention relates to data mining technology, in particular to a sample labeling method based on crowdsourcing mode. Background technique [0002] A high-quality dataset is crucial for model training. However, in real life, the amount of labeled data is often small, and efficient and accurate labeling of data is time-consuming and expensive. In response to this problem, active learning has gradually become a research hotspot, and one of the keys is the effective labeling of unlabeled samples. Traditional machine learning algorithms often do not take into account the different accuracy of labelers. In order to improve the accuracy of labeling, researchers have proposed some crowdsourcing labeling algorithms, the most important problem of which is the labeling error of the labeler. [0003] In order to improve the performance of classifiers trained based on crowdsourced labeled data, Zhang et al. proposed a meta-learning integration method for group...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06K9/62
CPCG06F18/24G06F18/214
Inventor 何国良王晗黄成瑞
Owner WUHAN UNIV