Text generation method and device

A text and model-generating technology, applied in instrumentation, computing, semantic analysis, etc., can solve problems such as insufficient training of the training text recognition model, destruction of the semantic structure of the text corpus, and inability to guarantee the quantity and quality of the text corpus

Active Publication Date: 2020-08-21
BEIJING SINOVOICE TECH CO LTD
View PDF9 Cites 7 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, limited by the collection conditions, sometimes the quantity and quality of the text corpus cannot be guaranteed, making the text corpus sparse, resulting in insufficient training of the training text recognition model
[0004] In order to expand the text corpus, the existing technology usually adopts the method of adding noise. On the basis of the original text corpus, a new text corpus is generated by means of synonym replacement, random word insertion, random word deletion, random word exchange, etc., but the synonym replacement may As a result, the new text corpus has a high similarity with the original text corpus, and the expansion effect is poor, and random word insertion, deletion, exchange, etc. in the text corpus may destroy the semantic structure of the text corpus, and the destruction of the semantic structure of the text corpus may affect the performance of the text recognition model. The efficiency of training and the accuracy of recognition results

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Text generation method and device
  • Text generation method and device
  • Text generation method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0029] Exemplary embodiments of the present invention will be described in more detail below with reference to the accompanying drawings. Although exemplary embodiments of the present invention are shown in the drawings, it should be understood that the invention may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided for more thorough understanding of the present invention and to fully convey the scope of the present invention to those skilled in the art.

[0030] figure 1 is a flow chart of the steps of a text generation method provided by an embodiment of the present invention, such as figure 1 As shown, the method may include:

[0031] Step 101. Obtain the first participle in the first text corpus.

[0032] In the embodiment of the present invention, the expansion of the text corpus can be obtained by converting the text corpus with different semantic structures into the target text corpus of...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a text generation method and device, and relates to the technical field of natural languages. The invention provides a text generation method and device. The method comprises the steps of determining a first segmented word in a first text corpus; under the condition that the first segmented word is matched with the preset feature, replacing the first segmented word with a feature mark corresponding to a preset feature to obtain a second text corpus, obtaining a first word vector corresponding to a first segmented word in the first text corpus, the first segmented word and the first feature vector corresponding to the feature mark in the second text corpus; inputting the first word vector and the first feature vector into the text generation model at the moment, obtaining the output target word vectors of the target semantic structure, and then obtaining the target text corpus according to the target word vectors. The target text corpus obtained by the embodimentof the invention comprises the required and complete target semantic structure, and the first text corpus does not limit the obtaining mode, so that the problems of high similarity and poor extensioneffect between the extended target text corpuses are avoided.

Description

technical field [0001] The present invention relates to the technical field of natural language, in particular to a text generation method and device. Background technique [0002] In order to meet the growing needs of named entity recognition, speech recognition, speech synthesis, machine translation, etc., text recognition models are needed for text recognition. [0003] At present, the corresponding text recognition model is usually trained by collecting text corpus in different scenarios, different fields, and different language families. However, in order to ensure that the text recognition model is fully trained and improve the accuracy of the recognition results, a large amount of text corpus that meets the model training requirements is usually required. However, limited by the acquisition conditions, sometimes the quantity and quality of the text corpus cannot be guaranteed, making the text corpus sparse, resulting in insufficient training of the training text reco...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F40/284G06F40/30
CPCG06F40/284G06F40/30
Inventor 吴帅李健武卫东
Owner BEIJING SINOVOICE TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products