Multi-speaker and multi-language speech synthesis method and system thereof

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A speech synthesis and speaker technology, applied in speech synthesis, speech analysis, instruments, etc., to achieve the effects of high voice quality, fast speech synthesis, and fluent conversion

Active Publication Date: 2021-03-02

SICHUAN CHANGHONG ELECTRIC CO LTD

View PDF10 Cites 8 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0003] The purpose of the present invention is to provide a multi-speaker, multi-language speech synthesis method and system, which is used to solve the problem that the prior art cannot satisfy the condition of consistent speakers, and realize multi-speaker and multi-language speech synthesis with a monolingual speech database. Speech Synthesis Problems for Languages

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment 1

[0040] combined with figure 1 As shown, a multi-speaker, multi-language speech synthesis method, including:

[0041] Training phase:

[0042] Step S11: Obtain a multi-speaker and single-language speech training database, extract speech acoustic features, and the multi-speaker and single-language speech training data includes multi-speaker speech data and corresponding texts in at least two or more different languages; Speech acoustic features include Mel spectral features, spectral energy features, and fundamental frequency features; optionally, Chinese and English speech databases are selected as training databases, and the Chinese data set can use Biaobei’s public female voice database and our own recordings covering 20 Multiple voice databases; English voice databases can use LJSpeech, VCTK and other public databases;

[0043] Step S12: Process the texts in different languages of the speech training database into a unified representation, that is, process them into a un...

Embodiment 2

[0056] A multi-speaker, multi-language speech synthesis system, including a text processing module, an information marking module, an information encoding module, an acoustic feature output module and a vocoder module, wherein:

[0057] The text processing module is used to normalize the text, classify the text according to the language and process the text in different languages into a unified expression;

[0058] Optionally, process the text in different languages of the speech database into a unified phoneme expression, or process the text in different languages into a unified Unicode encoding expression; if it is used for training, use the MFA algorithm to convert text and audio in different languages Align, obtain the aligned text and the corresponding duration of the text, convert the duration into the number of frames, and the sum of the duration and the number of frames is equal to the sum of the extracted Mel spectrum features;

[0059] The information marking m...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a multi-speaker and multi-language speech synthesis method. The method comprises the following steps: extracting speech acoustic features; processing the texts in different languages into a unified representation mode, and aligning the audio with the texts to obtain duration information; constructing a speaker space and a language space, generating a speaker id and a language id, extracting a speaker vector and a language vector, adding the speaker vector and the language vector into the initial speech synthesis model, and training the initial speech synthesis model byusing the aligned text, duration information and speech acoustic features to obtain a speech synthesis model; processing the to-be-synthesized text to generate the speaker id and the language id; andinputting the speaker id, the text and the language id into a speech synthesis model, outputting speech acoustic features and converting the speech acoustic features into audio. A system is also disclosed. According to the method, unentanglement of the characteristics of the speaker and the language characteristics is realized, and conversion of the speaker or the language can be realized only bychanging the id.

Description

technical field [0001] The invention relates to the technical field of speech synthesis, in particular, a multi-speaker and multi-language speech synthesis method and system. Background technique [0002] Speech synthesis is a technology that converts text information into speech information, that is, converting text information into arbitrary audible speech, involving multiple disciplines such as acoustics, linguistics, and computer science. However, how to build a multi-speaker and multi-language speech synthesis system using a monolingual speech database while maintaining speaker consistency has always been a difficult problem. Traditional multilingual speech synthesis systems rely on multilingual speech databases. However, multilingual databases are difficult to obtain in practice (it is difficult to find speakers who are proficient in multiple languages to record speech data), and it is not possible to arbitrarily modify the speaker's timbre, language pronunciation, e...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G10L13/047G10L13/04G10L13/08G10L25/24

CPCG10L13/047G10L13/08G10L25/24

Inventor 朱海王昆周琳珉刘书君

Owner SICHUAN CHANGHONG ELECTRIC CO LTD

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Multi-speaker and multi-language speech synthesis method and system thereof

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment 1

Embodiment 2

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology