Corpus generation method, apparatus and device, and computer readable storage medium

A corpus and similarity calculation technology, applied in the field of big data processing, can solve the problems of not satisfying sentence grammar habits, rigid grammar, and inability to meet the requirements of real input simulation, and achieve the effect of improving effective coverage and expanding the number of

Pending Publication Date: 2021-04-02
深圳赛安特技术服务有限公司
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] At present, there are various generation methods in the industry, such as corpus generated through the arrangement of word slots, but most of these generation methods do not meet the grammatical habits of sentences, nor can they meet the real input simulation demands of actual user scenarios; At the same time, sentences that are s

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Corpus generation method, apparatus and device, and computer readable storage medium
  • Corpus generation method, apparatus and device, and computer readable storage medium
  • Corpus generation method, apparatus and device, and computer readable storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0025]It should be understood that the specific embodiments described herein are merely intended to illustrate the invention and are not intended to limit the invention.

[0026]The present invention provides a corpus generating method. Referfigure 1 As shown, a flow diagram of a corpus generating method provided by an embodiment of the present invention is shown. The method can be performed by a device, which can be implemented by software and / or hardware.

[0027]In this embodiment, the corpus generating method includes:

[0028]S110, based on the intentional character generation instruction, obtain the intentional phylogenesis corresponding to the intentional expression of the intent generation instruction.

[0029]Specifically, the processor receives the intentional expression of the intention, which is based on the intentively generated instructions. The intention of the intention of the intention is included in the instruction, including the intention of generating a corpus, for example...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to big data processing, and discloses a corpus generation method, which comprises the steps of obtaining an intention corpus template corresponding to an intention corpus generation instruction according to the intention corpus generation instruction; obtaining word slot fields in the intention corpus template, and performing permutation and combination on the positions of the word slot fields to obtain a primary corpus statement with a blank slot position; performing synonym extension on the word slot fields in the primary corpus statement through a preset synonym extension model to obtain an extended primary corpus statement with a blank slot position; and according to the part-of-speech of the word required to be filled in the blank slot of the extended primary corpus statement, selecting a noise word of a corresponding part-of-speech from a preset noise word library, and filling the noise word into the blank slot of the extended primary corpus statement to generate an intention corpus. The invention also relates to a blockchain technology. The preset synonym extension model is stored in a blockchain. According to the method, the effective coverage of corpora can be improved; and batch type effective intention identification test corpora is generated.

Description

Technical field[0001]The present invention relates to large data processing, and more particularly to a method, apparatus, an electronic device, and a computer readable storage medium generated by a corporate.Background technique[0002]In the field of natural language processing (NLP), the problem of corpus has always been a big problem for testing. How to quickly generate large quantities effective testing texts, have become an NLP algorithm test worker must face and seek a solution to solve; in nature Language processing system, intended to identify the subsystem of the algorithm, which is a subsystem that intends to manage-conversation management - Words Management - Dictionary Management - Answers Configuration Management; Dialogue Management Function in the intention of the algorithm The intended relationship between intention and dialog template; simultaneously implement the relationship between the dialog template and the slot. A dialog template is composed of different gallog...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F40/186G06F40/205G06F40/247
CPCG06F40/186G06F40/205G06F40/247
Inventor 陆海鹏
Owner 深圳赛安特技术服务有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products