A corpus generation method, device, electronic device and readable storage medium

A technology of electronic equipment and corpus, applied in the creation of semantic tools, digital data processing, natural language data processing, etc., can solve the problems of high cost of manual annotation, error-prone, low accuracy of control model training, etc.

Active Publication Date: 2022-02-18
GREE ELECTRIC APPLIANCES INC +1
View PDF7 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] The present invention provides a corpus generation method, device, electronic equipment and readable storage medium, which are used to solve the problem in the prior art that the cost of manual labeling is high and error-prone, resulting in low accuracy of control model training

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A corpus generation method, device, electronic device and readable storage medium
  • A corpus generation method, device, electronic device and readable storage medium
  • A corpus generation method, device, electronic device and readable storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0040] figure 1 A schematic diagram of a corpus generation process provided by an embodiment of the present invention, the process includes the following steps:

[0041] S101: According to the identification information of each first vocabulary classification set corresponding to the sentence structure, acquire the first vocabulary in each first vocabulary classification set in the control vocabulary database, wherein the sentence structure is preset.

[0042] The corpus generation method provided by the embodiment of the present invention is applied to an electronic device, and the electronic device may be an electronic device such as a desktop computer or a server. Preferably, since the amount of corpus data required for model training is large, the electronic device may be a device with relatively high computing capability.

[0043] The electronic device is preset with a sentence structure, wherein the sentence structure includes at least one of a subject-predicate-object ...

Embodiment 2

[0055] In order to further expand the data volume of the corpus for control model training, on the basis of the above-mentioned embodiments, in the embodiment of the present invention, the generation of the first corpus conforming to the sentence structure further includes:

[0056] For the first vocabulary in the first corpus, synonyms of the first vocabulary are obtained; and the synonyms are used to replace the first vocabulary in the first corpus to generate the first corpus.

[0057] In the actual control process of the control model, due to the different usage habits and actual needs of the users, different control instructions may be used to perform the same control, which requires the control model to be able to identify these control instructions as accurately as possible. Therefore, in the control model training In the process, a large amount of corpus is used to participate in training to achieve higher accuracy of recognition. In the embodiment of the present inven...

Embodiment 3

[0063] On the basis of the above-mentioned embodiments, in the embodiment of the present invention, if the control vocabulary also includes a second vocabulary classification set, after generating the first corpus conforming to the sentence structure, the method further includes :

[0064] According to the saved second position information of the vocabulary in the second vocabulary classification set in the sentence structure, insert the third vocabulary in the second vocabulary classification set into the corresponding position in the first corpus , to update the first corpus.

[0065] In the actual control process of the control model, due to the different usage habits and actual needs of the user, different control instructions from the user may be obtained. Taking the user's voice control of the air conditioner as an example, some users are used to saying "please turn on the cooling mode of the air conditioner" , some users are used to say "help me turn on the cooling mod...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a corpus generation method, device, electronic equipment and readable storage medium. The method includes: according to the identification information of each first vocabulary classification set corresponding to the sentence structure, obtaining each first vocabulary classification set in the control vocabulary base A first vocabulary in a vocabulary classification set, wherein the sentence structure is preset; according to the first position information of the vocabulary in the first vocabulary classification set in the sentence structure, the acquired first A vocabulary is combined to generate the first corpus conforming to the sentence structure. In the present invention, because the generated first corpus is composed of the first vocabulary belonging to the first vocabulary classification set, and the first vocabulary is combined in the first position in the sentence structure according to the first vocabulary classification set to which it belongs, so there is no need to further The manual labeling of the corpus saves the cost of manual labeling, reduces the error rate, and thus improves the accuracy of the control model training.

Description

technical field [0001] The present invention relates to the field of smart home technology, in particular to a corpus generation method, device, electronic equipment and readable storage medium. Background technique [0002] With the rapid development of smart home technology, voice-based home control methods are becoming more and more common. Generally, a large amount of data is used for semantic model training in the early stage, so that users can control home devices based on the trained semantic model. , thus greatly improving the accuracy of semantic system recognition. [0003] In the process of semantic model training, a large amount of annotated corpus needs to be used. At present, the main way to obtain annotated corpus is to collect a large amount of network data text, and then manually annotate the network data text by annotators. [0004] However, the cost of manually labeling network data texts is relatively high and error-prone, so the accuracy of control mode...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/36G06F16/35G06F40/211G06F40/247G06F40/289
CPCG06F16/36G06F16/35
Inventor 黄姿荣贾巨涛吴伟秦子宁赵鹏辉
Owner GREE ELECTRIC APPLIANCES INC
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products