Polyphone model training method, and speech synthesis method and device

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A technology of speech synthesis and polyphonic words, applied in the field of speech, can solve problems such as labor-intensive, redundant rules, and long rule summarization cycle, so as to avoid inaccuracies, improve accuracy, and reduce training cycles.

Active Publication Date: 2016-02-17

BAIDU ONLINE NETWORK TECH (BEIJIBG) CO LTD

View PDF5 Cites 46 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0009] For the first method, (1) it is labor-intensive, and the cycle of rule summary is long

Since there may be conflicts between the rules, after writing a rule, you need to spend energy to verify the effect to prevent conflicts between the rules, and then it is impossible to predict polyphonic characters

This process is extremely time-consuming, especially when there are many rules, it may take several months to manually summarize a series of rules that work well

(2) The rules have one-sidedness and limitations

The rules summarized in the early and later stages have certain contradictions and inconsistencies due to changes in the focus of consideration, which gradually lead to redundant rules, resulting in gradually lower quality, and it is difficult for humans to find problems and correct them

Or the rules only take into account the effects of a certain type of conditions, but in other contexts it will not apply

(3) The rules have problems of low scalability and low robustness

The problem of polyphone prediction depends on the results of front-end natural language processing modules such as word segmentation. Artificial rules can only be summarized and written based on the current word segmentation results. Once the front-end word segmentation results change in the future, the summarized rules may not be applicable.

[0010] For the second method, in the process of training the model, a large amount of manually labeled sample data is often required. Manual labeling of sample data takes a long time and is inefficient. Moreover, human errors may cause data quality to decline, thereby affecting all models. The effect of the polyphone prediction of the trained model

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0035] Embodiments of the present invention are described in detail below, examples of which are shown in the drawings, wherein the same or similar reference numerals designate the same or similar elements or elements having the same or similar functions throughout. The embodiments described below by referring to the figures are exemplary and are intended to explain the present invention and should not be construed as limiting the present invention.

[0036] The polyphone model training method for speech synthesis, speech synthesis method and device according to the embodiments of the present invention will be described below with reference to the accompanying drawings.

[0037] figure 1 It is a flowchart of a polyphone model training method for speech synthesis according to an embodiment of the present invention.

[0038] Such as figure 1 Shown, this polyphone model training method for speech synthesis comprises:

[0039] S1, processing the voice data set and the text set ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a polyphone model training method for speech synthesis, and a speech synthesis method and device. The method comprises the following steps of processing a voice data set and a text set so as to generate a training corpus set, wherein the text set corresponds to the voice data set, and the training corpus set comprises texts and Pinyin sequences corresponding to the texts; extracting feature information of the texts; and training polyphone models according to the feature information and the Pinyin sequence. According to the polyphone model training method for speech synthesis, in a polyphone model training process, manual labeling on Pinyin of the texts is not required, a training period of the polyphone models is greatly shortened, meanwhile, the circumstance that the trained polyphone models are inaccurate due to wrong manual labeling is avoided, and accuracy of the trained polyphone models is improved.

Description

technical field [0001] The invention relates to the field of speech technology, in particular to a polyphone model training method for speech synthesis, a speech synthesis method and a device. Background technique [0002] Speech synthesis, also known as text-to-speech (Text to Speech) technology, is a technology that can convert text information into speech and read it aloud. It involves multiple disciplines such as acoustics, linguistics, digital signal processing, and computer science. It is a cutting-edge technology in the field of Chinese information processing. The main problem to be solved is how to convert text information into audible sound information. [0003] In the speech synthesis system, the process of converting text information into sound information is as follows: first, the input text needs to be processed, including preprocessing, word segmentation, part-of-speech tagging, polyphone prediction, prosodic level prediction, etc., and then through the acousti...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): G10L13/08

CPCG10L13/08

Inventor 李秀林肖朔白洁张辉彭一平陈杰

Owner BAIDU ONLINE NETWORK TECH (BEIJIBG) CO LTD

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Polyphone model training method, and speech synthesis method and device

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology