Eureka AIR delivers breakthrough ideas for toughest innovation challenges, trusted by R&D personnel around the world.

Multi-speaker and multi-language speech synthesis method and system thereof

A speech synthesis and speaker technology, applied in speech synthesis, speech analysis, instruments, etc., to achieve the effects of high voice quality, fast speech synthesis, and fluent conversion

Active Publication Date: 2021-03-02
SICHUAN CHANGHONG ELECTRIC CO LTD
View PDF10 Cites 8 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] The purpose of the present invention is to provide a multi-speaker, multi-language speech synthesis method and system, which is used to solve the problem that the prior art cannot satisfy the condition of consistent speakers, and realize multi-speaker and multi-language speech synthesis with a monolingual speech database. Speech Synthesis Problems for Languages

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Multi-speaker and multi-language speech synthesis method and system thereof
  • Multi-speaker and multi-language speech synthesis method and system thereof

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0040] combined with figure 1 As shown, a multi-speaker, multi-language speech synthesis method, including:

[0041] Training phase:

[0042] Step S11: Obtain a multi-speaker and single-language speech training database, extract speech acoustic features, and the multi-speaker and single-language speech training data includes multi-speaker speech data and corresponding texts in at least two or more different languages; Speech acoustic features include Mel spectral features, spectral energy features, and fundamental frequency features; optionally, Chinese and English speech databases are selected as training databases, and the Chinese data set can use Biaobei’s public female voice database and our own recordings covering 20 Multiple voice databases; English voice databases can use LJSpeech, VCTK and other public databases;

[0043] Step S12: Process the texts in different languages ​​of the speech training database into a unified representation, that is, process them into a un...

Embodiment 2

[0056] A multi-speaker, multi-language speech synthesis system, including a text processing module, an information marking module, an information encoding module, an acoustic feature output module and a vocoder module, wherein:

[0057] The text processing module is used to normalize the text, classify the text according to the language and process the text in different languages ​​into a unified expression;

[0058] Optionally, process the text in different languages ​​of the speech database into a unified phoneme expression, or process the text in different languages ​​into a unified Unicode encoding expression; if it is used for training, use the MFA algorithm to convert text and audio in different languages Align, obtain the aligned text and the corresponding duration of the text, convert the duration into the number of frames, and the sum of the duration and the number of frames is equal to the sum of the extracted Mel spectrum features;

[0059] The information marking m...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a multi-speaker and multi-language speech synthesis method. The method comprises the following steps: extracting speech acoustic features; processing the texts in different languages into a unified representation mode, and aligning the audio with the texts to obtain duration information; constructing a speaker space and a language space, generating a speaker id and a language id, extracting a speaker vector and a language vector, adding the speaker vector and the language vector into the initial speech synthesis model, and training the initial speech synthesis model byusing the aligned text, duration information and speech acoustic features to obtain a speech synthesis model; processing the to-be-synthesized text to generate the speaker id and the language id; andinputting the speaker id, the text and the language id into a speech synthesis model, outputting speech acoustic features and converting the speech acoustic features into audio. A system is also disclosed. According to the method, unentanglement of the characteristics of the speaker and the language characteristics is realized, and conversion of the speaker or the language can be realized only bychanging the id.

Description

technical field [0001] The invention relates to the technical field of speech synthesis, in particular, a multi-speaker and multi-language speech synthesis method and system. Background technique [0002] Speech synthesis is a technology that converts text information into speech information, that is, converting text information into arbitrary audible speech, involving multiple disciplines such as acoustics, linguistics, and computer science. However, how to build a multi-speaker and multi-language speech synthesis system using a monolingual speech database while maintaining speaker consistency has always been a difficult problem. Traditional multilingual speech synthesis systems rely on multilingual speech databases. However, multilingual databases are difficult to obtain in practice (it is difficult to find speakers who are proficient in multiple languages ​​to record speech data), and it is not possible to arbitrarily modify the speaker's timbre, language pronunciation, e...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G10L13/047G10L13/04G10L13/08G10L25/24
CPCG10L13/047G10L13/08G10L25/24
Inventor 朱海王昆周琳珉刘书君
Owner SICHUAN CHANGHONG ELECTRIC CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Eureka Blog
Learn More
PatSnap group products