Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Corpus data labeling method based on swarm intelligence

A corpus and data technology, applied in the field of corpus labeling, to reduce the error rate of labeling, shorten the development cycle and labeling cycle, and improve the accuracy rate

Inactive Publication Date: 2018-11-23
SHENZHEN QIANHAI YYD ROBOT CO LTD
View PDF3 Cites 4 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] In order to solve the shortcomings of traditional labeling methods, the present invention proposes a corpus data labeling method and system based on crowd intelligence

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Corpus data labeling method based on swarm intelligence
  • Corpus data labeling method based on swarm intelligence
  • Corpus data labeling method based on swarm intelligence

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0035] refer to figure 1 As shown, the annotation system model is constructed according to the corpus data annotation process. The annotation system model mainly includes four parts: annotation model 1, annotation data generator 2, data to be annotated 3, and annotated data 4.

[0036] refer to figure 2 , 3 , 4, and 5, on the basis of the labeling system model, a corpus data labeling method based on crowd intelligence is proposed, including the following steps:

[0037] S1. At the initial stage of labeling, the user enters the data to be labeled 3, and initializes the labeling model 1 and the labeling data generator 2 at the same time;

[0038] S2. The annotation data generator 2 extracts a certain amount of data from the data to be annotated 3 and the annotated data 4 to generate annotated data 7 for the annotator;

[0039] S3. The labeling model 1 performs iterative training based on the labeled data 4, and then automatically labels the labeling data 3 to generate the la...

Embodiment 2

[0044] refer to Image 6 , 7 , 8, a corpus data tagging system based on crowd intelligence based on the method of Embodiment 1, the system includes a Web background server 100, a GPU server 200, and multiple computers 300 equipped with browsers. Web background server comprises labeling data generating unit 101, labeling data storage unit 102, instruction sending unit 103; Many computers 300 equipped with browsers are used for labeling personnel to log in data labeling system, input data 3 to be labelled, and label data 7 For labeling or confirmation, the GPU server 200 is used to run the labeling model unit 201, and perform iterative training according to the marked data 4 and instructions provided by the web background server 100, wherein the labeling model unit 201 is used to mark the labeling data 7, and in The R&D process generates annotated models1.

[0045]The input of the tagging model unit 201 is a sentence or segment in the corpus data to be tagged, and the output i...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a corpus data labeling method based on swarm intelligence. A user inputs to-be-labeled data; a labeling model and a labeled data generator are initialized; the labeled data generator extracts a certain number of data from the to-be-labeled data and labeled data for labeling personnel to generate the labeled data; the labeling model is generated in a research and developmentprocess; the labeling model performs iterative training according to the labeled data, and then performs automatic labeling on the to-be-labeled data to generate automatically labeled data of the labeling model; and the labeling personnel perform labeling or confirmation on the labeled data, and store the labeled data. Meanwhile, based on the data labeling method, the invention provides a corpusdata labeling system based on the swarm intelligence. According to the labeling method and system provided by the invention, the development process and the labeling process can be combined, so that the research and development period and the labeling period can be remarkably shortened, and the accuracy of data labeling is improved.

Description

technical field [0001] The patent of the present invention relates to the field of corpus labeling, in particular to a corpus data labeling method and system based on crowd intelligence. Background technique [0002] Today, with the rapid development of the Internet and artificial intelligence, the need for data labeling is becoming more and more urgent. In the face of data labeling requirements, the existing labeling methods use word, excel and other text forms, or develop a special web page system for organizing labeling corpus. Annotators then annotate the corpus in the form of text or web pages. [0003] The problem with the existing labeling methods is that the labelers do their own thing and label their own data, and there are also differences in labeling standards between the labelers; the labeling and R&D processes are independent of each other and cannot be developed collaboratively; Correction, or can only be corrected by secondary marking. The above problems li...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/24G06F17/30
CPCG06F40/166
Inventor 肖中华
Owner SHENZHEN QIANHAI YYD ROBOT CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products