Supercharge Your Innovation With Domain-Expert AI Agents!

Data feature enhancement method and device for corpus data and computer equipment

A data feature and corpus technology, applied in computer parts, computing, semantic analysis, etc., can solve the problems of low efficiency of high-quality training sets and high labor costs

Pending Publication Date: 2020-11-10
PING AN TECH (SHENZHEN) CO LTD
View PDF0 Cites 3 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] Embodiments of the present invention provide a data feature enhancement method, device, computer equipment, and storage medium for corpus data, aiming at solving the problem of manually expanding training corpus in the prior art, requiring high labor costs, and expanding the expected data process. The data cleaning process in is also done manually, which leads to the problem of low efficiency in obtaining high-quality training sets

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Data feature enhancement method and device for corpus data and computer equipment
  • Data feature enhancement method and device for corpus data and computer equipment
  • Data feature enhancement method and device for corpus data and computer equipment

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0036] The following will clearly and completely describe the technical solutions in the embodiments of the present invention with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are some of the embodiments of the present invention, but not all of them. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.

[0037] It should be understood that when used in this specification and the appended claims, the terms "comprising" and "comprises" indicate the presence of described features, integers, steps, operations, elements and / or components, but do not exclude one or Presence or addition of multiple other features, integers, steps, operations, elements, components and / or collections thereof.

[0038] It should also be understood that the terminology used ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a data feature enhancement method and device for corpus data, computer equipment and a storage medium, and relates to an artificial intelligence technology. After a full corpusdata set is obtained, data grouping is carried out to obtain multiple groups of corpus data subsets; A to-be-trained user intention recognition model is trained after deleting each group of corpus data subsets in sequence to obtain a plurality of user intention recognition models, and each piece of data in the full corpus data set is taken as training sample data and test sample data; a model average correct rate difference value, a sample recall rate difference value and a prediction correct rate difference value are calculated correspondingly to obtain a sample contribution degree triple corresponding to each corpus data; and if three difference values in the sample contribution degree triad corresponding to the corpus data are negative values, target corpus data is acquired to form a to-be-deleted corpus data set so as to delete the target corpus data from the full corpus data set. Automatic cleaning of the negative contribution corpus data is achieved, human intervention is not needed in the cleaning process, and the obtaining efficiency of a high-quality training set is improved.

Description

technical field [0001] The present invention relates to the technical field of artificial intelligence model trusteeship, in particular to a data feature enhancement method, device, computer equipment and storage medium for corpus data. Background technique [0002] Traditional conversational robots use corpus data to train deep learning models to complete tasks such as user intent recognition. The quality of the training corpus is the key to the effect of the model. The quality of the corpus is generally measured by two aspects: "quality" and "quantity". "Quality" is to ensure the correctness of the corpus and the boundaries between different intentions are clear, and "quantity" is to ensure that the model can fully learn the distribution of data features. , the two complement each other and are indispensable. [0003] When sorting out the training data, the R&D personnel found that when expanding the "quantity" of the training set, adding a sample to the training set does...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06K9/62G06F40/30
CPCG06F40/30G06F18/214Y02D10/00
Inventor 林佳佳郝正鸿王少军肖京
Owner PING AN TECH (SHENZHEN) CO LTD
Features
  • R&D
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More