A corpus generation method, device, electronic device and readable storage medium
A technology of electronic equipment and corpus, applied in the creation of semantic tools, digital data processing, natural language data processing, etc., can solve the problems of high cost of manual annotation, error-prone, low accuracy of control model training, etc.
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0040] figure 1 A schematic diagram of a corpus generation process provided by an embodiment of the present invention, the process includes the following steps:
[0041] S101: According to the identification information of each first vocabulary classification set corresponding to the sentence structure, acquire the first vocabulary in each first vocabulary classification set in the control vocabulary database, wherein the sentence structure is preset.
[0042] The corpus generation method provided by the embodiment of the present invention is applied to an electronic device, and the electronic device may be an electronic device such as a desktop computer or a server. Preferably, since the amount of corpus data required for model training is large, the electronic device may be a device with relatively high computing capability.
[0043] The electronic device is preset with a sentence structure, wherein the sentence structure includes at least one of a subject-predicate-object ...
Embodiment 2
[0055] In order to further expand the data volume of the corpus for control model training, on the basis of the above-mentioned embodiments, in the embodiment of the present invention, the generation of the first corpus conforming to the sentence structure further includes:
[0056] For the first vocabulary in the first corpus, synonyms of the first vocabulary are obtained; and the synonyms are used to replace the first vocabulary in the first corpus to generate the first corpus.
[0057] In the actual control process of the control model, due to the different usage habits and actual needs of the users, different control instructions may be used to perform the same control, which requires the control model to be able to identify these control instructions as accurately as possible. Therefore, in the control model training In the process, a large amount of corpus is used to participate in training to achieve higher accuracy of recognition. In the embodiment of the present inven...
Embodiment 3
[0063] On the basis of the above-mentioned embodiments, in the embodiment of the present invention, if the control vocabulary also includes a second vocabulary classification set, after generating the first corpus conforming to the sentence structure, the method further includes :
[0064] According to the saved second position information of the vocabulary in the second vocabulary classification set in the sentence structure, insert the third vocabulary in the second vocabulary classification set into the corresponding position in the first corpus , to update the first corpus.
[0065] In the actual control process of the control model, due to the different usage habits and actual needs of the user, different control instructions from the user may be obtained. Taking the user's voice control of the air conditioner as an example, some users are used to saying "please turn on the cooling mode of the air conditioner" , some users are used to say "help me turn on the cooling mod...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com