Corpus enhancement method and device, computer equipment and readable storage medium

A computer program and corpus technology, applied in the field of artificial intelligence design, can solve the problem of low corpus quality and achieve the effect of improving confidence

Pending Publication Date: 2022-03-18
ONE CONNECT SMART TECH CO LTD SHENZHEN
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] In view of the above-mentioned shortcomings of the prior art, the purpose of the present invention is to provide a corpus enhancement

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Corpus enhancement method and device, computer equipment and readable storage medium
  • Corpus enhancement method and device, computer equipment and readable storage medium
  • Corpus enhancement method and device, computer equipment and readable storage medium

Examples

Experimental program
Comparison scheme
Effect test

Example Embodiment

[0044] The embodiments of the present invention are described below through specific specific examples, and those skilled in the art can easily understand other advantages and effects of the present invention from the contents disclosed in this specification. The present invention can also be implemented or applied through other different specific embodiments, and various details in this specification can also be modified or changed based on different viewpoints and applications without departing from the spirit of the present invention.

[0045]It should be noted that the drawings provided in this embodiment are only to illustrate the basic concept of the present invention in a schematic way, so the drawings only show the components related to the present invention rather than the number, shape and the number of components in actual implementation. For dimension drawing, the type, quantity and proportion of each component can be changed at will in actual implementation, and th...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention belongs to the technical field of artificial intelligence, and provides a corpus enhancement method and device, computer equipment and a readable storage medium, the corpus enhancement method comprises the following steps: obtaining a dictionary and a corpus, according to a first word frequency and a second word frequency, vectorizing a corpus without a label in the corpus, and obtaining an N-dimensional feature matrix; obtaining labels in the corpus, performing matrix processing on the N-dimensional feature matrix according to the labels, and obtaining a first matrix used for bearing semantics of the labels and words and a second matrix used for bearing semantics of the labels and corpora; the similarity between the first matrix and the second matrix is obtained, and the labels are distributed to the linguistic data without the labels according to the similarity. The enhanced corpus can map the association relationship among the tag, the corpus, the dictionary and the untagged corpus, so that the confidence coefficient of the tag allocated to the untagged corpus is improved, and the purpose of providing high-quality corpus for model training is achieved.

Description

technical field [0001] The present invention is designed in the technical field of artificial intelligence, and in particular, relates to a corpus enhancement method, apparatus, computer equipment and readable storage medium. Background technique [0002] In recent years, with the development of artificial intelligence (Artificial Intelligence, AI) technology, the technological progress of natural language processing (NLP) has been promoted, and the precision of natural language processing is getting higher and higher. In the training of natural language processing models, a large amount of model data is required, especially in some processing tasks, there are often insufficient data volume of model data, great difficulty in labeling, and inaccurate labeling. Therefore, it is necessary to perform data enhancement processing on the marked corpus to expand more high-quality corpora. [0003] At present, corpus enhancement methods include synonym replacement, entity replacemen...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06K9/62G06F40/216G06F40/242
CPCG06F40/216G06F40/242G06F18/214G06F18/24
Inventor 卢宁
Owner ONE CONNECT SMART TECH CO LTD SHENZHEN
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products