Speech synthesis method and device, equipment and storage medium

A speech synthesis and audio technology, applied in speech synthesis, speech analysis, instruments, etc., can solve the problems of single emotional expression of synthesized speech and inability to control speaking style independently.
CN112786009APending Publication Date: 2021-05-11PING AN TECH (SHENZHEN) CO LTD

Patent Information

Authority / Receiving Office
CN · China
Current Assignee / Owner
PING AN TECH (SHENZHEN) CO LTD
Publication Date
2021-05-11

Smart Images

  • Figure 1
    Figure 1
  • Figure 2
    Figure 2
  • Figure 3
    Figure 3
Patent Text Reader

Abstract

The invention relates to the technical field of artificial intelligence, and discloses a speech synthesis method and a device thereof, computer equipment and a computer readable storage medium, and the method comprises the steps: obtaining a to-be-processed text and a to-be-synthesized speech style audio, and inputting the to-be-processed text and the to-be-synthesized speech style audio into a preset speech synthesis model, encoding the speech style audio to be synthesized based on the multi-reference encoder, and obtaining style embedding vector information; encoding the to-be-processed text based on the text encoder to obtain text encoding vector information; splicing the style embedding vector information and the text coding vector information through the full connection layer to generate a Mel language spectrogram; and performing feature extraction on the Mel-language spectrogram through the output layer, and outputting a target audio of the to-be-processed text, thereby realizing control of the speaking style of the synthesized voice, and synthesizing the voice with more emotional expressions.
Need to check novelty before this filing date? Find Prior Art

Description

technical field

[0001] The present application relates to the technical field of speech semantics, in particular to a speech synthesis method, device, computer equipment and computer-readable storage medium. Background technique

[0002] In the process of speech synthesis, not only the clarity and fluency of the synthesized speech must be considered, but also the prosodic information of the synthesized speech, so that the synthesized speech has rich emotional expression. When synthesizing speech, not only consider the smoothness of the sentence, but also consider changing the emotional state of the speaker, and use the model to learn the style information of the reference audio, so as to achieve a level comparable to the human voice. In the current prosodic model construction, the common method is to classify all speaking styles into one expression, and the speaking styles cannot be separated, so the speaking styles cannot be controlled separately, and the emotional expressi...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More