Multi-language speech synthesis model training method and device

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A technology of speech synthesis and training methods, applied in speech synthesis, speech analysis, instruments, etc., can solve problems such as difficult to obtain single-person multilingual data sets, and achieve the effect of solving non-standard accents

Pending Publication Date: 2021-11-26

INST OF ACOUSTICS CHINESE ACAD OF SCI

View PDF0 Cites 5 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

However, in multilingual speech synthesis technology, it is best to use single-person multilingual high-quality data sets to train multilingual speech synthesis models, but it is difficult to obtain single-person multilingual high-quality data sets

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0083] The technical solutions of the embodiments of this specification will be described in detail below in conjunction with the accompanying drawings.

[0084] The embodiment of this specification discloses a training method of a multilingual speech synthesis model. The following introduces the application scenarios and inventive concepts of the training of the multilingual speech synthesis model, specifically as follows:

[0085] At present, in the multilingual speech synthesis technology, it is difficult to obtain high-quality multilingual data sets for a single person, which can be used to train speech synthesis models that can realize cross-language types, that is, multilingual speech synthesis models. Currently, single-language data sets are mostly used to train a multilingual speech synthesis model. Specifically, it is necessary to use the method of timbre transfer to perform timbre conversion on the audio in the monolingual data set, so as to obtain the data sets of t...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The embodiment of the invention provides a multi-language speech synthesis model training method and device. The method comprises the steps: respectively training a style encoder, a text encoder and a decoder based on Mel spectrum feature tags, sample phoneme sequences and speaker identification tags of a sample audio of each sample language to obtain a style encoder, a text encoder and a decoder which can decouple the timbre (represented by a speaker identifier), the style and the text content of the audio, and further by utilizing a speaker identifier tag and a sample phoneme sequence of the sample audio and a style vector of the sample audio output by the trained style encoder as tags, training a style predictor to obtain a multi-language speech synthesis model.

Description

technical field [0001] This description relates to the technical field of speech synthesis, in particular to a training method and device for a multilingual speech synthesis model. Background technique [0002] Recently, with the development of deep learning, the functional effects of speech synthesis systems have been greatly improved. However, in multilingual speech synthesis technology, it is best to use single-person multilingual high-quality data sets to train multilingual speech synthesis models, but it is difficult to obtain single-person multilingual high-quality data sets. [0003] Then, how to provide a multilingual speech model based on a monolingual data set and train to obtain a better effect of style and accent has become an urgent problem to be solved. Contents of the invention [0004] One or more embodiments of this specification provide a multilingual speech synthesis model training method and device, so as to realize the multilingual speech synthesis mo...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G10L13/02G10L13/08G10L25/27G10L25/03

CPCG10L13/02G10L13/08G10L25/27G10L25/03

Inventor 张鹏远尚增强颜永红

Owner INST OF ACOUSTICS CHINESE ACAD OF SCI

Multi-language speech synthesis model training method and device

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology