Multi-speaker voice synthesis method, system and device

A speech synthesis and speaker technology, applied in speech synthesis, speech analysis, instruments, etc., can solve the problems of lack of detailed description of the speaker's pronunciation characteristics, speech synthesis task is not optimal, etc., to improve fine description and provide accuracy Effect

Inactive Publication Date: 2019-10-15
INST OF AUTOMATION CHINESE ACAD OF SCI
View PDF3 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, in the traditional method, the features in the speaker recognition task are mostly used, which is not optimal for the speech synthesis task, and lacks a fine description of the speaker's pronunciation characteristics

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Multi-speaker voice synthesis method, system and device
  • Multi-speaker voice synthesis method, system and device
  • Multi-speaker voice synthesis method, system and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0071] Preferred embodiments of the present invention are described below with reference to the accompanying drawings. Those skilled in the art should understand that these embodiments are only used to explain the technical principles of the present invention, and are not intended to limit the protection scope of the present invention.

[0072] The present invention provides a multi-speaker speech synthesis method, by extracting text features from the text to be tested, and dynamically combining the text features with sentence-level dictionaries and phoneme-level dictionaries to obtain phoneme-related speaker features, thereby improving A detailed description of the pronunciation characteristics of the speaker; further, according to the text features and the characteristics of the speaker, determine the voice information of the speaker; and then synthesize the voice through the vocoder based on the neural network, so as to effectively improve the accuracy of speech synthesis. ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a multi-speaker voice synthesis method and a multi-speaker voice synthesis system. The method comprises the steps: carrying out the voice acoustic statistical feature extraction from a multi-speaker corpus, and obtaining a sentence-level dictionary and a phoneme-level dictionary; extracting text features from the to-be-tested text based on a text analysis method; dynamically combining the text features with a sentence-level dictionary and a phoneme-level dictionary to obtain phoneme-related speaker features; based on an average sub-model and a self-adaptive sub-model,according to the text features and the speaker features, determining speaker voice information; and synthesizing voice through a vocoder based on a neural network according to the voice information ofthe speaker. According to the method, the text features are dynamically combined with the sentence-level dictionary and the phoneme-level dictionary to obtain phoneme-related speaker features, so that the fine description of the speaker features can be improved. Furthermore, voices are synthesized through a vocoder based on a neural network, so that the accuracy of voice synthesis can be effectively improved.

Description

technical field [0001] The present invention relates to the technical field of speech synthesis, in particular to a multi-speaker speech synthesis method, system and device based on phoneme-related speaker features. Background technique [0002] Speech synthesis technology, also known as text-to-speech (Text to Speech) technology, is used to convert text information into voice information. Currently, there are two main methods of speech synthesis: [0003] The first one: a corpus-based speech splicing synthesis method, which is to directly select appropriate primitives from the original recorded corpus to splicing and synthesizing speech. The second method: using parametric speech synthesis method, which is a specific implementation based on statistical acoustic modeling method, which models the acoustic parameters of speech, and reconstructs the trajectory of acoustic parameters through parameter generation algorithms, and finally invokes speech synthesis device to genera...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G10L13/08G10L13/047G10L25/30G10L25/03
CPCG10L13/047G10L13/08G10L25/03G10L25/30
Inventor 陶建华傅睿博温正棋
Owner INST OF AUTOMATION CHINESE ACAD OF SCI
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products