A multi-speaker, multi-language speech synthesis method and system

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A speech synthesis and speaker technology, applied in the field of multi-speaker, multi-language speech synthesis methods and systems, to achieve the effects of high sound quality, fast speed and fluent conversion

Active Publication Date: 2022-04-15

SICHUAN CHANGHONG ELECTRIC CO LTD

View PDF10 Cites 0 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0003] The purpose of the present invention is to provide a multi-speaker, multi-language speech synthesis method and system, which is used to solve the problem that the prior art cannot satisfy the condition of consistent speakers, and realize multi-speaker and multi-language speech synthesis with a monolingual speech database. Speech Synthesis Problems for Languages

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment 1

[0040] combined with figure 1 As shown, a multi-speaker, multi-language speech synthesis method, including:

[0041] Training phase:

[0042] Step S11: Obtain a multi-speaker and single-language speech training database, extract speech acoustic features, and the multi-speaker and single-language speech training data includes multi-speaker speech data and corresponding texts in at least two or more different languages; Speech acoustic features include Mel spectral features, spectral energy features, and fundamental frequency features; optionally, Chinese and English speech databases are selected as training databases, and the Chinese data set can use Biaobei’s public female voice database and our own recordings covering 20 Multiple voice databases; English voice databases can use LJSpeech, VCTK and other public databases;

[0043] Step S12: Process the texts in different languages of the speech training database into a unified representation, that is, process them into a un...

Embodiment 2

[0056] A multi-speaker, multi-language speech synthesis system, including a text processing module, an information marking module, an information encoding module, an acoustic feature output module and a vocoder module, wherein:

[0057] The text processing module is used to normalize the text, classify the text according to the language and process the text in different languages into a unified expression;

[0058] Optionally, process the text in different languages of the speech database into a unified phoneme expression, or process the text in different languages into a unified Unicode encoding expression; if it is used for training, use the MFA algorithm to convert text and audio in different languages Align, obtain the aligned text and the corresponding duration of the text, convert the duration into the number of frames, and the sum of the duration and the number of frames is equal to the sum of the extracted Mel spectrum features;

[0059] The information marking m...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a multi-speaker and multi-language speech synthesis method, which includes extracting speech acoustic features; processing texts in different languages into a unified representation mode, aligning audio and text to obtain duration information; constructing speaker space and language space, generate speaker id and language id, extract speaker vector and language vector and add them to the initial speech synthesis model, use the aligned text, duration information and speech acoustic features to train the initial speech synthesis model, and obtain the speech synthesis model ; Generate speaker id and language id after processing the synthesized text; input the speaker id, text and language id into the speech synthesis model, output the acoustic features of the speech and convert them into audio. A system is also disclosed. The present invention realizes the "de-entanglement" of the speaker's feature and the language feature, and only needs to change the id to realize the speaker or language conversion.

Description

technical field [0001] The invention relates to the technical field of speech synthesis, in particular, a multi-speaker and multi-language speech synthesis method and system. Background technique [0002] Speech synthesis is a technology that converts text information into speech information, that is, converting text information into arbitrary audible speech, involving multiple disciplines such as acoustics, linguistics, and computer science. However, how to build a multi-speaker and multi-language speech synthesis system using a monolingual speech database while maintaining speaker consistency has always been a difficult problem. Traditional multilingual speech synthesis systems rely on multilingual speech databases. However, multilingual databases are difficult to obtain in practice (it is difficult to find speakers who are proficient in multiple languages to record speech data), and it is not possible to arbitrarily modify the speaker's timbre, language pronunciation, e...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Patents(China)

IPC IPC(8): G10L13/047G10L13/04G10L13/08G10L25/24

CPCG10L13/047G10L13/08G10L25/24

Inventor 朱海王昆周琳珉刘书君

Owner SICHUAN CHANGHONG ELECTRIC CO LTD

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

A multi-speaker, multi-language speech synthesis method and system

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment 1

Embodiment 2

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology