Unlock instant, AI-driven research and patent intelligence for your innovation.

A text annotation noise detection method, device, storage medium and electronic equipment

A noise detection and text technology, applied in the field of deep learning, can solve the problems of cumbersome solutions to noise/wrongly labeled data, and achieve the effects of flexible trust configuration, strong integration, and simplified engineering steps

Active Publication Date: 2022-02-01
镁佳(北京)科技有限公司
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] In view of this, the embodiments of the present invention provide a text labeling noise detection method, device, storage medium and electronic equipment to solve the technical problem in the prior art that the solution to noise / wrong labeling data is more cumbersome

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A text annotation noise detection method, device, storage medium and electronic equipment
  • A text annotation noise detection method, device, storage medium and electronic equipment
  • A text annotation noise detection method, device, storage medium and electronic equipment

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0029] In order to make the purpose, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below in conjunction with the drawings in the embodiments of the present invention. Obviously, the described embodiments It is a part of embodiments of the present invention, but not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without making creative efforts belong to the protection scope of the present invention.

[0030] An embodiment of the present invention provides a text annotation noise detection method, such as figure 1 As shown, the detection method includes the following steps:

[0031] Step S101: Acquire the sample data set of the model to be trained; specifically, due to crowdsourcing or the influence of subjective judgment differences by multiple label...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a text labeling noise detection method, device, storage medium and electronic equipment. The method includes: obtaining a sample data set of a model to be trained; obtaining a model prediction result by using K-fold cross-validation according to the sample data set; The output of the result calculates the trust degree of each data in the sample data set; the noise text of the sample data set is determined according to the relationship between the trust degree and the trust degree threshold. By implementing the present invention, a trust degree measurement index is proposed, and by evaluating the trust degree of data in a data sample set, noise texts are screened out according to a threshold value, which can be used for error correction of engineering data labeling. Moreover, since the detection process of the detection method has nothing to do with the neural model, there is no need to make any changes to the model; therefore, compared with the method of probability estimation and robustness, the detection method has strong integration and simplifies the tedious engineering steps; it can Provides flexible trust configuration, making the detection process and effect more controllable.

Description

technical field [0001] The present invention relates to the technical field of deep learning, in particular to a method, device, storage medium and electronic equipment for detecting text annotation noise. Background technique [0002] In intelligent conversational devices, deep learning neural networks are widely used in various tasks: such as speech classification, intent recognition-semantic slot recognition, etc. The device's analysis of the instructions issued by the user depends on the recognition results of the neural network model. [0003] Due to the influence of factors such as crowdsourcing and subjective judgment differences by multiple annotation personnel, there will be varying degrees of noise / mislabeling data when annotating the corpus for constructing the neural network training set. When training and fitting the neural network, the noise / mislabeling The data will directly affect the effect of the model, and even misidentify user intent. [0004] In the pr...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F40/295G06F40/216G06F40/30G06N20/00
CPCG06F40/295G06F40/30G06F40/216G06N20/00
Inventor 马星扬夏妍
Owner 镁佳(北京)科技有限公司