Training method for multiple personalized acoustic models, and voice synthesis method and voice synthesis device

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
An acoustic model and speech synthesis technology, applied in the field of speech, can solve problems such as low accuracy of the acoustic model, unstable timbre, and unnatural speech, so as to meet the needs of personalized speech and improve user experience

Active Publication Date: 2015-12-23

BAIDU ONLINE NETWORK TECH (BEIJIBG) CO LTD

View PDF6 Cites 71 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

Since two speakers generate the original speech according to different texts, and the pronunciation of the same syllable is obviously different in different sentence environments, therefore, if the same sound in different sentences of different speakers is made mapping, it is easy to cause the trained personalized acoustic model to be inaccurate, resulting in unnatural synthesized speech

[0010] For the second method, since the decision tree is a shallow model, its descriptive ability is limited, especially when the amount of user voice data is relatively small, the accuracy of the generated personalized acoustic model is not high, resulting in prediction The output parameters may be incoherent, which will cause jumps in the synthesized voice, unstable timbre, etc., resulting in unnatural voice

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0039] Embodiments of the present invention are described in detail below, examples of which are shown in the drawings, wherein the same or similar reference numerals designate the same or similar elements or elements having the same or similar functions throughout. The embodiments described below by referring to the figures are exemplary and are intended to explain the present invention and should not be construed as limiting the present invention.

[0040] The following describes the personalized multi-acoustic model training method for speech synthesis, speech synthesis method and device according to the embodiments of the present invention with reference to the accompanying drawings.

[0041] figure 1 It is a flowchart of a training method for a personalized multi-acoustic model for speech synthesis according to an embodiment of the present invention.

[0042] Such as figure 1 As shown, the training method of the personalized multi-acoustic model for speech synthesis inc...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a training method for multiple personalized acoustic models, a voice synthesis method and a voice synthesis device. The training method comprises the following steps: training a reference acoustic model according to first acoustic feature data of training voice data and first text annotation data corresponding to the training voice data; acquiring voice data of a target user; training a first target user acoustic model according to the reference acoustic model and the voice data; generating second acoustic feature data of the first text annotation data according to the first target user acoustic model and the first text annotation data; training a second target user acoustic model according to the first text annotation data and the second acoustic feature data. According to the model training method disclosed by the embodiment, in a process of training a target user acoustic model, the requirement on the scale of voice data of the target user is lowered, and a plurality of personalized acoustic models including the voice features of the target user can be trained by using a small amount of user voice data.

Description

technical field [0001] The invention relates to the technical field of speech, in particular to a training method for a personalized multi-acoustic model for speech synthesis, a speech synthesis method and a device. Background technique [0002] Speech synthesis, also known as text-to-speech (Text to Speech) technology, is a technology that can convert text information into speech and read it aloud. It involves multiple disciplines such as acoustics, linguistics, digital signal processing, and computer science. It is a cutting-edge technology in the field of Chinese information processing. The main problem to be solved is how to convert text information into audible sound information. [0003] In the speech synthesis system, the process of converting text information into sound information is as follows: first, the input text needs to be processed, including preprocessing, word segmentation, part-of-speech tagging, polyphone prediction, prosodic level prediction, etc., and t...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G10L13/02G10L15/02G10L15/183

CPCG10L13/02G10L13/10G10L15/02G10L15/183G10L13/08G10L15/04G10L15/063G10L15/142G10L15/1807G10L2015/0631

Inventor李秀林

OwnerBAIDU ONLINE NETWORK TECH (BEIJIBG) CO LTD

Training method for multiple personalized acoustic models, and voice synthesis method and voice synthesis device

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology