Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

A multi-speaker, multi-language speech synthesis method and system

A speech synthesis and speaker technology, applied in the field of multi-speaker, multi-language speech synthesis methods and systems, to achieve the effects of high sound quality, fast speed and fluent conversion

Active Publication Date: 2022-04-15
SICHUAN CHANGHONG ELECTRIC CO LTD
View PDF10 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] The purpose of the present invention is to provide a multi-speaker, multi-language speech synthesis method and system, which is used to solve the problem that the prior art cannot satisfy the condition of consistent speakers, and realize multi-speaker and multi-language speech synthesis with a monolingual speech database. Speech Synthesis Problems for Languages

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A multi-speaker, multi-language speech synthesis method and system
  • A multi-speaker, multi-language speech synthesis method and system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0040] combined with figure 1 As shown, a multi-speaker, multi-language speech synthesis method, including:

[0041] Training phase:

[0042] Step S11: Obtain a multi-speaker and single-language speech training database, extract speech acoustic features, and the multi-speaker and single-language speech training data includes multi-speaker speech data and corresponding texts in at least two or more different languages; Speech acoustic features include Mel spectral features, spectral energy features, and fundamental frequency features; optionally, Chinese and English speech databases are selected as training databases, and the Chinese data set can use Biaobei’s public female voice database and our own recordings covering 20 Multiple voice databases; English voice databases can use LJSpeech, VCTK and other public databases;

[0043] Step S12: Process the texts in different languages ​​of the speech training database into a unified representation, that is, process them into a un...

Embodiment 2

[0056] A multi-speaker, multi-language speech synthesis system, including a text processing module, an information marking module, an information encoding module, an acoustic feature output module and a vocoder module, wherein:

[0057] The text processing module is used to normalize the text, classify the text according to the language and process the text in different languages ​​into a unified expression;

[0058] Optionally, process the text in different languages ​​of the speech database into a unified phoneme expression, or process the text in different languages ​​into a unified Unicode encoding expression; if it is used for training, use the MFA algorithm to convert text and audio in different languages Align, obtain the aligned text and the corresponding duration of the text, convert the duration into the number of frames, and the sum of the duration and the number of frames is equal to the sum of the extracted Mel spectrum features;

[0059] The information marking m...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a multi-speaker and multi-language speech synthesis method, which includes extracting speech acoustic features; processing texts in different languages ​​into a unified representation mode, aligning audio and text to obtain duration information; constructing speaker space and language space, generate speaker id and language id, extract speaker vector and language vector and add them to the initial speech synthesis model, use the aligned text, duration information and speech acoustic features to train the initial speech synthesis model, and obtain the speech synthesis model ; Generate speaker id and language id after processing the synthesized text; input the speaker id, text and language id into the speech synthesis model, output the acoustic features of the speech and convert them into audio. A system is also disclosed. The present invention realizes the "de-entanglement" of the speaker's feature and the language feature, and only needs to change the id to realize the speaker or language conversion.

Description

technical field [0001] The invention relates to the technical field of speech synthesis, in particular, a multi-speaker and multi-language speech synthesis method and system. Background technique [0002] Speech synthesis is a technology that converts text information into speech information, that is, converting text information into arbitrary audible speech, involving multiple disciplines such as acoustics, linguistics, and computer science. However, how to build a multi-speaker and multi-language speech synthesis system using a monolingual speech database while maintaining speaker consistency has always been a difficult problem. Traditional multilingual speech synthesis systems rely on multilingual speech databases. However, multilingual databases are difficult to obtain in practice (it is difficult to find speakers who are proficient in multiple languages ​​to record speech data), and it is not possible to arbitrarily modify the speaker's timbre, language pronunciation, e...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G10L13/047G10L13/04G10L13/08G10L25/24
CPCG10L13/047G10L13/08G10L25/24
Inventor 朱海王昆周琳珉刘书君
Owner SICHUAN CHANGHONG ELECTRIC CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products