Data dynamic labeling method and device for coarse-grained text classification

A text classification and data dynamic technology, applied in the field of data labeling, can solve the problems of text data labeling deviation and text understanding, so as to ensure the effect of the model, reduce the workload and reduce the error

Active Publication Date: 2019-09-27
成都冰鉴信息科技有限公司
View PDF4 Cites 13 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, the traditional data labeling method is based on the behavior of manual labeling. This processing method is likely to cause the following mistakes: 1. Each data labeler has a different understanding of the text, which leads to deviations in text data labeling

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Data dynamic labeling method and device for coarse-grained text classification
  • Data dynamic labeling method and device for coarse-grained text classification
  • Data dynamic labeling method and device for coarse-grained text classification

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0022] Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. Although exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited by the embodiments set forth herein. Rather, these embodiments are provided for more thorough understanding of the present disclosure, and to fully convey the scope of the present disclosure to those skilled in the art.

[0023] figure 1 It shows the flow chart of the data dynamic labeling method for coarse-grained text classification provided by the embodiment of the present invention, see figure 1 , the dynamic data labeling method for coarse-grained text classification provided by the embodiment of the present invention includes:

[0024] S1. Balance the labeled data according to the proportion of label categories, where the data corresponding ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a data dynamic labeling method and device for coarse-grained text classification. The method comprises the steps of balancing the labeling data according to the label category proportion; constructing a text TF-IDF word frequency matrix; performing feature screening by using chi-square distribution to obtain a training data set; training the training data set by using a machine learning algorithm to obtain an initial model; acquiring a test data set, and labeling data of a first preset data volume in the test data set by using the initial model to obtain predicted labeling data; obtaining data corresponding to each data annotation label obtained by respectively extracting a second preset number of pieces of predicted annotation data according to the label category for auditing; adding the data corresponding to each data labeling label into a training data set, and training the training data set by using a machine learning algorithm to obtain a correction model; judging whether the training data volume meets a second preset data volume or not, and if not, continuing to execute the process; wherein the storage correction model is a prediction model.

Description

technical field [0001] The present invention relates to the technical field of data labeling, in particular to a method and device for dynamic data labeling for coarse-grained text classification. Background technique [0002] Coarse-grained text classification is a classification method based on the sentence level. For common coarse-grained text classification projects, they all rely on supervised learning for processing. Therefore, having a good quality dataset is the basis for model building. However, the traditional data labeling method relies on manual labeling behaviors. This processing method is likely to cause the following errors: 1. Each data labeler has a different understanding of the text, which leads to deviations in text data labeling. 2. Due to the existence of prior knowledge, each labeler has a wrong understanding of the subject content of the text, so the category label is wrong. Contents of the invention [0003] The present invention aims to provide...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/35G06K9/62
CPCG06F16/355G06F18/2113G06F18/214Y02D10/00
Inventor 顾凌云严涵王洪阳
Owner 成都冰鉴信息科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products