Corpus data labeling method based on swarm intelligence
A corpus and data technology, applied in the field of corpus labeling, to reduce the error rate of labeling, shorten the development cycle and labeling cycle, and improve the accuracy rate
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0035] refer to figure 1 As shown, the annotation system model is constructed according to the corpus data annotation process. The annotation system model mainly includes four parts: annotation model 1, annotation data generator 2, data to be annotated 3, and annotated data 4.
[0036] refer to figure 2 , 3 , 4, and 5, on the basis of the labeling system model, a corpus data labeling method based on crowd intelligence is proposed, including the following steps:
[0037] S1. At the initial stage of labeling, the user enters the data to be labeled 3, and initializes the labeling model 1 and the labeling data generator 2 at the same time;
[0038] S2. The annotation data generator 2 extracts a certain amount of data from the data to be annotated 3 and the annotated data 4 to generate annotated data 7 for the annotator;
[0039] S3. The labeling model 1 performs iterative training based on the labeled data 4, and then automatically labels the labeling data 3 to generate the la...
Embodiment 2
[0044] refer to Image 6 , 7 , 8, a corpus data tagging system based on crowd intelligence based on the method of Embodiment 1, the system includes a Web background server 100, a GPU server 200, and multiple computers 300 equipped with browsers. Web background server comprises labeling data generating unit 101, labeling data storage unit 102, instruction sending unit 103; Many computers 300 equipped with browsers are used for labeling personnel to log in data labeling system, input data 3 to be labelled, and label data 7 For labeling or confirmation, the GPU server 200 is used to run the labeling model unit 201, and perform iterative training according to the marked data 4 and instructions provided by the web background server 100, wherein the labeling model unit 201 is used to mark the labeling data 7, and in The R&D process generates annotated models1.
[0045]The input of the tagging model unit 201 is a sentence or segment in the corpus data to be tagged, and the output i...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com