Speech synthesis method and device, computer equipment and storage medium
A technology of speech synthesis and phoneme, applied in speech synthesis, speech analysis, instruments, etc., to achieve the effect of improving naturalness and ensuring robustness
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0031] figure 1 It is a flowchart of a speech synthesis method provided by Embodiment 1 of the present invention. This embodiment is applicable to training DurIAN (DURATION INFORMED ATTENTION NETWORK, attention network based on duration information) network as an acoustic model of TTS, training HiFi-GAN (Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis, high-efficiency, high-fidelity speech synthesis generation confrontation network) network is the vocoder situation of TTS, and this method can be carried out by speech synthesis device, and this speech synthesis device can be controlled by software and / Or hardware implementation, which can be configured in computer equipment, such as servers, workstations, personal computers, etc., specifically includes the following steps:
[0032] Step 101. Obtain the audio signal recorded when the speaker speaks in a specified style, the text information expressing the content of the audio signal, and the spe...
Embodiment 2
[0182] image 3 It is a flow chart of a speech synthesis method provided by Embodiment 1 of the present invention. This embodiment is applicable to the case where the DurIAN network is used as the TTS acoustic model and the HiFi-GAN network is used as the TTS vocoder for speech synthesis. The method It can be performed by a speech synthesis device, which can be implemented by software and / or hardware, and can be configured in a computer device, such as a server, a workstation, a personal computer, a mobile terminal (such as a mobile phone, a tablet computer, a smart wearable device, etc. ), etc., specifically include the following steps:
[0183] Step 301. Determine the text information of the speech to be synthesized, the speaker and the style of the text information.
[0184] In this embodiment, the user selects the text information to be synthesized in the client, for example, the content in the novel, the content in the news, the content in the webpage, etc., can display ...
Embodiment 3
[0244] Figure 4 A structural block diagram of a speech synthesis device provided in Embodiment 3 of the present invention may specifically include the following modules:
[0245] The synthesized data determination module 401 is used to determine the text information of the voice to be synthesized, the speaker and the style of the text information;
[0246] A language information extraction module 402, configured to extract information representing linguistics from the text information as language information;
[0247] Synthesis system determination module 403, used to determine that the DurIAN network is an acoustic model, and the HiFi-GAN network is a vocoder;
[0248] A spectral feature generating module 404, configured to input the language information into the DurIAN network as an acoustic model, and convert it into a spectral feature conforming to the speaker speaking the text information in the style;
[0249] The speech signal generating module 405 is configured to i...
PUM
Login to View More Abstract
Description
Claims
Application Information
Login to View More 


