Data labeling system and method based on intelligent distribution algorithm

A technology of intelligent distribution and data processing system, which is applied in the direction of neural learning methods, calculations, computer components, etc., can solve the problems of different labeling literacy levels of labelers, high error rate of manual processing, etc., and achieve efficient and safe data management mode, The effect of manual processing error rate reduction and labeling quality assurance

Pending Publication Date: 2019-08-30
武汉黑松露科技有限公司
View PDF0 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] First of all, it is a huge challenge to the management of the company. While developing products, a lot of energy must be put on how to manage a large number of signers;
[0004] Secondly, the salary of a large number of full-time labelers is also a big challenge for start-up companies and research laboratories;
[0005] The data is marked by the labelers, but the level of labeling literacy of the labelers is different, resulting in a high error rate in manual processing

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Data labeling system and method based on intelligent distribution algorithm
  • Data labeling system and method based on intelligent distribution algorithm
  • Data labeling system and method based on intelligent distribution algorithm

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0031] according to Figure 1-2 The shown a kind of data processing system based on intelligent allocation algorithm comprises data analysis module 1, feature acquisition module 2 and intelligent allocation module 3, and the output end of described data analysis module 1 is connected with the input end of feature acquisition module 2, and described feature The output terminal of acquisition module 2 is connected with the input terminal of intelligent distribution module 3;

[0032] The data parsing module 1 includes a model database, and a plurality of different deep learning models are internally stored in the model database;

[0033] The data analysis module 1 is used to adopt different deep learning models for different tasks, combine the coordination algorithm related to the Attention mechanism, cooperate with the multi-model fusion scheme, maximize the potential of the module, and obtain the basic quantitative characteristics of the given data. The team will analyze acco...

Embodiment 2

[0043] For a data labeling system and method based on an intelligent allocation algorithm proposed in the above embodiment, the algorithm evaluation is summarized as follows (note: the % in all the charts below are normalized 100%):

[0044] The use of intelligent allocation algorithms can reduce the error rate of manual data processing. In the manual processing tasks of text data, the manual error rate can be reduced by about 20-30%. The specific performance depends on two controllable factors: processing personnel (annotators) ) selection, the number of advance data (see step S1 in embodiment 1);

[0045] Comparative experiment settings: the data to be tagged are sentences extracted from various abstracts, such as "the remaining stars, the sadness that was hurt by you cannot be erased." The tagger needs to segment it and mark the part of speech, and the finished product is like "residual_ a’s _u star _n, _wp wipe_v not _d go_v was _p you _r hurt_v _u’s _u sad n. _wp”; this t...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a data labeling system and method based on an intelligent distribution algorithm, and specifically relates to the field of data processing. The system comprises a data analysismodule, a feature acquisition module and an intelligent distribution module. The output end of the data analysis module is connected with the input end of the feature acquisition module, and the output end of the feature acquisition module is connected with the input end of the intelligent distribution module. The method comprises the following specific processing steps: screening small-scale representative and instructive key data as advanced data by using a data analysis module; carrying out trial labeling, accurate labeling and analysis on the 'leading data' by a labeling person to obtaina 'standard answer', dynamically matching, and then taking the exclusive labeling feature of each labeling person; and using an intelligent distribution module to intelligently distribute the remaining data. According to the method, the manual processing error rate of the data can be reduced by utilizing an intelligent distribution algorithm, and in the manual processing task of the text type data, the manual error rate can be reduced by about 20-30%.

Description

technical field [0001] The present invention relates to the technical field of data processing, and more specifically, the present invention relates to a data labeling system and method based on an intelligent allocation algorithm. Background technique [0002] At present, if most AI laboratories and start-up AI companies employ a large amount of manpower for data labeling in the early stages of development, they will have to face the following two situations: [0003] First of all, it is a huge challenge to the management of the company. While developing products, a lot of energy must be put on how to manage a large number of signers; [0004] Secondly, the salary of a large number of full-time labelers is also a big challenge for start-up companies and research laboratories; [0005] The data is marked by the labelers, but the level of labeling literacy of the labelers is different, resulting in a high error rate in manual processing. Contents of the invention [0006]...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06K9/62G06N3/04G06N3/08
CPCG06N3/04G06N3/08G06F18/2155
Inventor 裴正奇聂泽宁
Owner 武汉黑松露科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products