Unlock instant, AI-driven research and patent intelligence for your innovation.

Multi-language speech synthesis model training method and device

A technology of speech synthesis and training methods, applied in speech synthesis, speech analysis, instruments, etc., can solve problems such as difficult to obtain single-person multilingual data sets, and achieve the effect of solving non-standard accents

Pending Publication Date: 2021-11-26
INST OF ACOUSTICS CHINESE ACAD OF SCI
View PDF0 Cites 5 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, in multilingual speech synthesis technology, it is best to use single-person multilingual high-quality data sets to train multilingual speech synthesis models, but it is difficult to obtain single-person multilingual high-quality data sets

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Multi-language speech synthesis model training method and device
  • Multi-language speech synthesis model training method and device
  • Multi-language speech synthesis model training method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0083] The technical solutions of the embodiments of this specification will be described in detail below in conjunction with the accompanying drawings.

[0084] The embodiment of this specification discloses a training method of a multilingual speech synthesis model. The following introduces the application scenarios and inventive concepts of the training of the multilingual speech synthesis model, specifically as follows:

[0085] At present, in the multilingual speech synthesis technology, it is difficult to obtain high-quality multilingual data sets for a single person, which can be used to train speech synthesis models that can realize cross-language types, that is, multilingual speech synthesis models. Currently, single-language data sets are mostly used to train a multilingual speech synthesis model. Specifically, it is necessary to use the method of timbre transfer to perform timbre conversion on the audio in the monolingual data set, so as to obtain the data sets of t...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The embodiment of the invention provides a multi-language speech synthesis model training method and device. The method comprises the steps: respectively training a style encoder, a text encoder and a decoder based on Mel spectrum feature tags, sample phoneme sequences and speaker identification tags of a sample audio of each sample language to obtain a style encoder, a text encoder and a decoder which can decouple the timbre (represented by a speaker identifier), the style and the text content of the audio, and further by utilizing a speaker identifier tag and a sample phoneme sequence of the sample audio and a style vector of the sample audio output by the trained style encoder as tags, training a style predictor to obtain a multi-language speech synthesis model.

Description

technical field [0001] This description relates to the technical field of speech synthesis, in particular to a training method and device for a multilingual speech synthesis model. Background technique [0002] Recently, with the development of deep learning, the functional effects of speech synthesis systems have been greatly improved. However, in multilingual speech synthesis technology, it is best to use single-person multilingual high-quality data sets to train multilingual speech synthesis models, but it is difficult to obtain single-person multilingual high-quality data sets. [0003] Then, how to provide a multilingual speech model based on a monolingual data set and train to obtain a better effect of style and accent has become an urgent problem to be solved. Contents of the invention [0004] One or more embodiments of this specification provide a multilingual speech synthesis model training method and device, so as to realize the multilingual speech synthesis mo...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G10L13/02G10L13/08G10L25/27G10L25/03
CPCG10L13/02G10L13/08G10L25/27G10L25/03
Inventor 张鹏远尚增强颜永红
Owner INST OF ACOUSTICS CHINESE ACAD OF SCI