Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Annotated data processing method and device

A technology for labeling data and processing methods, applied in the field of data labeling, can solve the problems of reducing labeling efficiency, huge labeling information, and unbalanced data to be labelled, and achieves the effect of improving training effect, ensuring accuracy, and improving prediction effect.

Pending Publication Date: 2019-12-24
大箴(杭州)科技有限公司
View PDF0 Cites 6 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

In many tasks, the annotation system of the corpus is often difficult to grasp. If the categories are too coarse, the language cannot be described comprehensively and meticulously. If the categories are too fine, the annotation information will be too large and the annotation efficiency will be reduced. In both labeling and named entity recognition tasks, a large amount of labeled data is required to be combined as training data
[0004] Existing labeling systems and labeling methods support multi-person real-time labeling systems, all of which import labeling data at one time and hand it over to labelers. Since the distribution of labeling data is not known before labeling, the extracted data to be labeled is often unbalanced As a result, some important data that needs to be labeled have not been extracted, and some data has been extracted too much and repeated labeling, making the distribution of the data to be labeled uneven, affecting the effect of the labeled data on model training

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Annotated data processing method and device
  • Annotated data processing method and device
  • Annotated data processing method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0056] Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. Although exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited by the embodiments set forth herein. Rather, these embodiments are provided for more thorough understanding of the present disclosure and to fully convey the scope of the present disclosure to those skilled in the art.

[0057] The embodiment of the present invention provides a method for processing labeled data, which can update the data to be labeled based on the real-time statistical labeling results and improve the training effect of the model, such as figure 1 As shown, the method includes:

[0058] 101. Obtain sample data under various categories randomly extracted from machine data as data to be labeled.

[0059] Among them, machine data is...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses an annotated data processing method, an annotated data processing device, computer equipment and a computer storage medium, relates to the technical field of data annotation, and aims to adjust the distribution of to-be-annotated data and improve the effect of annotated data on model training. The method comprises the steps of obtaining sample data under each category randomly extracted from machine data to serve as to-be-labeled data; in a process of labeling the to-be-labeled data based on a labeling platform, counting labeled data under each category, and judging whether the labeled data under each category respectively reaches a training standard preset for a classification prediction model or not; if yes, taking the labeled data reaching the training standard category as training data, and inputting the training data into the network model for training to obtain a classification prediction model; and updating the to-be-labeled data according to the prediction probability of the classification prediction model for the test data.

Description

technical field [0001] The present invention relates to the technical field of data labeling, in particular to a processing method, device, computer equipment and computer storage medium for labeling data. Background technique [0002] In recent years, with the continuous development of computer and Internet technology, various intelligent applications have emerged one after another, and tools such as big data and artificial intelligence have been gradually applied to practice. Natural language processing is a direction of artificial intelligence, enabling computers to understand human language and understand the content, thoughts and emotions expressed in the language. [0003] Since the mainstream technology of natural language technology processing is mainly based on statistical machine learning, these technologies mainly rely on two aspects, one is the statistical model and optimization algorithm for different tasks; the other is the corresponding large-scale corpus. Th...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06K9/62G06N3/04G06N3/08
CPCG06N3/08G06N3/045G06F18/241
Inventor 刘逸哲
Owner 大箴(杭州)科技有限公司
Features
  • Generate Ideas
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More