Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Corpus processing method and device and storage medium

A processing method and corpus technology, applied in the direction of electrical digital data processing, special data processing applications, instruments, etc., can solve the problems of cost, more research and development time, and loss of most information, etc., and achieve the effect of improving screening efficiency and reliability

Active Publication Date: 2019-09-24
GUANGZHOU DUOYI NETWORK TECH +2
View PDF5 Cites 4 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] In the process of implementing the present invention, the inventors found that there is no reliable standard for the selection of corpus in the prior art. If a small amount of corpus is randomly selected from a large amount of marked corpus for training, most of the information of the original corpus may be lost. As a result, the effect of the trained model is not good; if a large amount of labeled corpus is screened one by one and all of them to obtain a small amount of corpus that retains most of the information of the original corpus, it will take a lot of research and development time

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Corpus processing method and device and storage medium
  • Corpus processing method and device and storage medium
  • Corpus processing method and device and storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0046] The following will clearly and completely describe the technical solutions in the embodiments of the present invention with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only some, not all, embodiments of the present invention. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without creative efforts fall within the protection scope of the present invention.

[0047] see figure 1 , is a schematic flowchart of a corpus processing method provided in Embodiment 1 of the present invention. A corpus processing method provided in Embodiment 1 of the present invention includes steps S11 to S12.

[0048] S11. Obtain the phoneme frequency of each phoneme in the original corpus and the sentence length frequency of each sentence; wherein, the phoneme frequency of each phoneme represents the number of the same phoneme in the origin...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a corpus processing method. The corpus processing method comprises the steps: acquiring the phoneme frequency of each phoneme and the sentence length frequency of each sentence in an original corpus, wherein the phoneme frequency of each phoneme represents the number of the same phonemes in the original corpus, and the sentence length frequency of each sentence represents the number of sentences with the same sentence length in the original corpus; and calculating a frequency parameter of each sentence according to the phoneme frequency and the sentence length frequency, and taking the frequency parameter as a score of the sentence, wherein the frequency parameter is in negative correlation with the phoneme frequency, and is in negative correlation with the sentence length frequency. The invention further discloses a corpus processing device and a storage medium. According to the corpus processing method, a reliable standard is provided for corpus selection, so that the reliability of corpus sentence selection during screening can be improved, and the screening efficiency of a large number of text corpora is effectively improved, and the corpus processing method is suitable for large-scale corpus information screening tasks.

Description

technical field [0001] The present invention relates to the technical field of language processing, in particular to a language material processing method, device and storage medium. Background technique [0002] In recent years, with the maturity of speech technology, speech synthesis technology is gradually being applied to speech signal processing systems such as speech interaction, sound broadcast, and personalized sound production. In the field of society and business, speech synthesis technology brings convenience and richness to social life, and has potentially broad use value. Therefore, screening high-quality corpus is necessary to improve the research and development efficiency of speech synthesis and adapt to business scenarios. [0003] In the prior art, under limited hardware resources, a large amount of labeled corpus is needed in the speech synthesis task, and it takes a long time to train to obtain a better model. In order to improve the training efficiency...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/27G06F16/33
CPCG06F16/3344G06F40/216Y02D10/00
Inventor 徐波
Owner GUANGZHOU DUOYI NETWORK TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products