Training method for multiple personalized acoustic models, and voice synthesis method and voice synthesis device

An acoustic model and speech synthesis technology, applied in the field of speech, can solve problems such as low accuracy of the acoustic model, unstable timbre, and unnatural speech, so as to meet the needs of personalized speech and improve user experience

Active Publication Date: 2015-12-23
BAIDU ONLINE NETWORK TECH (BEIJIBG) CO LTD
View PDF6 Cites 71 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Since two speakers generate the original speech according to different texts, and the pronunciation of the same syllable is obviously different in different sentence environments, therefore, if the same sound in different sentences of different speakers is made mapping, it is easy to cause the trained personalized acoustic model to be inaccurate, resulting in unnatural synthesized speech
[0010] For

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Training method for multiple personalized acoustic models, and voice synthesis method and voice synthesis device
  • Training method for multiple personalized acoustic models, and voice synthesis method and voice synthesis device
  • Training method for multiple personalized acoustic models, and voice synthesis method and voice synthesis device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0039] Embodiments of the present invention are described in detail below, examples of which are shown in the drawings, wherein the same or similar reference numerals designate the same or similar elements or elements having the same or similar functions throughout. The embodiments described below by referring to the figures are exemplary and are intended to explain the present invention and should not be construed as limiting the present invention.

[0040] The following describes the personalized multi-acoustic model training method for speech synthesis, speech synthesis method and device according to the embodiments of the present invention with reference to the accompanying drawings.

[0041] figure 1 It is a flowchart of a training method for a personalized multi-acoustic model for speech synthesis according to an embodiment of the present invention.

[0042] Such as figure 1 As shown, the training method of the personalized multi-acoustic model for speech synthesis inc...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a training method for multiple personalized acoustic models, a voice synthesis method and a voice synthesis device. The training method comprises the following steps: training a reference acoustic model according to first acoustic feature data of training voice data and first text annotation data corresponding to the training voice data; acquiring voice data of a target user; training a first target user acoustic model according to the reference acoustic model and the voice data; generating second acoustic feature data of the first text annotation data according to the first target user acoustic model and the first text annotation data; training a second target user acoustic model according to the first text annotation data and the second acoustic feature data. According to the model training method disclosed by the embodiment, in a process of training a target user acoustic model, the requirement on the scale of voice data of the target user is lowered, and a plurality of personalized acoustic models including the voice features of the target user can be trained by using a small amount of user voice data.

Description

technical field [0001] The invention relates to the technical field of speech, in particular to a training method for a personalized multi-acoustic model for speech synthesis, a speech synthesis method and a device. Background technique [0002] Speech synthesis, also known as text-to-speech (Text to Speech) technology, is a technology that can convert text information into speech and read it aloud. It involves multiple disciplines such as acoustics, linguistics, digital signal processing, and computer science. It is a cutting-edge technology in the field of Chinese information processing. The main problem to be solved is how to convert text information into audible sound information. [0003] In the speech synthesis system, the process of converting text information into sound information is as follows: first, the input text needs to be processed, including preprocessing, word segmentation, part-of-speech tagging, polyphone prediction, prosodic level prediction, etc., and t...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G10L13/02G10L15/02G10L15/183
CPCG10L13/02G10L13/10G10L15/02G10L15/183G10L13/08G10L15/04G10L15/063G10L15/142G10L15/1807G10L2015/0631
Inventor 李秀林
Owner BAIDU ONLINE NETWORK TECH (BEIJIBG) CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products