Polyphone model training method, and speech synthesis method and device

A technology of speech synthesis and polyphonic words, applied in the field of speech, can solve problems such as labor-intensive, redundant rules, and long rule summarization cycle, so as to avoid inaccuracies, improve accuracy, and reduce training cycles.

Active Publication Date: 2016-02-17
BAIDU ONLINE NETWORK TECH (BEIJIBG) CO LTD
View PDF5 Cites 46 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0009] For the first method, (1) it is labor-intensive, and the cycle of rule summary is long
Since there may be conflicts between the rules, after writing a rule, you need to spend energy to verify the effect to prevent conflicts between the rules, and then it is impossible to predict polyphonic characters
This process is extremely time-consuming, especially when there are many rules, it may take several months to manually summarize a series of rules that work well
(2) The rules have one-sidedness and limitations
The rules summarized in the early and later stages have certain contradictions and inconsistencies due to changes in the focus of consideration, which gradually lead to redundant rules, resulting in gradually lower quality, and it is difficult for humans to find problems and correct them
Or the rules only take into account the effects of a certain type of conditions, but in other contexts it will not apply
(3) The rules have problems of low scalability and low robustness
The problem of polyphone prediction depends on the results of front-end natural language processing modules such as word segmentation. Artificial rules can only be summarized and written based on the current word segmentation results. Once the front-end word segmentation results change in the future, the summarized rules may not be applicable.
[0010] For the second method, in the process of training the model, a large amount of manually labeled sample data is often required. Manual labeling of sample data takes a long time and is inefficient. Moreover, human errors may cause data quality to decline, thereby affecting all models. The effect of the polyphone prediction of the trained model

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Polyphone model training method, and speech synthesis method and device
  • Polyphone model training method, and speech synthesis method and device
  • Polyphone model training method, and speech synthesis method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0035] Embodiments of the present invention are described in detail below, examples of which are shown in the drawings, wherein the same or similar reference numerals designate the same or similar elements or elements having the same or similar functions throughout. The embodiments described below by referring to the figures are exemplary and are intended to explain the present invention and should not be construed as limiting the present invention.

[0036] The polyphone model training method for speech synthesis, speech synthesis method and device according to the embodiments of the present invention will be described below with reference to the accompanying drawings.

[0037] figure 1 It is a flowchart of a polyphone model training method for speech synthesis according to an embodiment of the present invention.

[0038] Such as figure 1 Shown, this polyphone model training method for speech synthesis comprises:

[0039] S1, processing the voice data set and the text set ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a polyphone model training method for speech synthesis, and a speech synthesis method and device. The method comprises the following steps of processing a voice data set and a text set so as to generate a training corpus set, wherein the text set corresponds to the voice data set, and the training corpus set comprises texts and Pinyin sequences corresponding to the texts; extracting feature information of the texts; and training polyphone models according to the feature information and the Pinyin sequence. According to the polyphone model training method for speech synthesis, in a polyphone model training process, manual labeling on Pinyin of the texts is not required, a training period of the polyphone models is greatly shortened, meanwhile, the circumstance that the trained polyphone models are inaccurate due to wrong manual labeling is avoided, and accuracy of the trained polyphone models is improved.

Description

technical field [0001] The invention relates to the field of speech technology, in particular to a polyphone model training method for speech synthesis, a speech synthesis method and a device. Background technique [0002] Speech synthesis, also known as text-to-speech (Text to Speech) technology, is a technology that can convert text information into speech and read it aloud. It involves multiple disciplines such as acoustics, linguistics, digital signal processing, and computer science. It is a cutting-edge technology in the field of Chinese information processing. The main problem to be solved is how to convert text information into audible sound information. [0003] In the speech synthesis system, the process of converting text information into sound information is as follows: first, the input text needs to be processed, including preprocessing, word segmentation, part-of-speech tagging, polyphone prediction, prosodic level prediction, etc., and then through the acousti...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G10L13/08
CPCG10L13/08
Inventor 李秀林肖朔白洁张辉彭一平陈杰
Owner BAIDU ONLINE NETWORK TECH (BEIJIBG) CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products