Speech synthesis method and device, computer equipment and storage medium

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A technology of speech synthesis and phoneme, applied in speech synthesis, speech analysis, instruments, etc., to achieve the effect of improving naturalness and ensuring robustness

Pending Publication Date: 2021-08-31

GUANGZHOU HUYA TECH CO LTD +1

View PDF7 Cites 4 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0005] The embodiment of the present invention proposes a speech synthesis method, device, computer equipment and storage medium to solve the problem of how to improve the robustness of timbre cloning in the case of low resources

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment 1

[0031] figure 1 It is a flowchart of a speech synthesis method provided by Embodiment 1 of the present invention. This embodiment is applicable to training DurIAN (DURATION INFORMED ATTENTION NETWORK, attention network based on duration information) network as an acoustic model of TTS, training HiFi-GAN (Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis, high-efficiency, high-fidelity speech synthesis generation confrontation network) network is the vocoder situation of TTS, and this method can be carried out by speech synthesis device, and this speech synthesis device can be controlled by software and / Or hardware implementation, which can be configured in computer equipment, such as servers, workstations, personal computers, etc., specifically includes the following steps:

[0032] Step 101. Obtain the audio signal recorded when the speaker speaks in a specified style, the text information expressing the content of the audio signal, and the spe...

Embodiment 2

[0182] image 3 It is a flow chart of a speech synthesis method provided by Embodiment 1 of the present invention. This embodiment is applicable to the case where the DurIAN network is used as the TTS acoustic model and the HiFi-GAN network is used as the TTS vocoder for speech synthesis. The method It can be performed by a speech synthesis device, which can be implemented by software and / or hardware, and can be configured in a computer device, such as a server, a workstation, a personal computer, a mobile terminal (such as a mobile phone, a tablet computer, a smart wearable device, etc. ), etc., specifically include the following steps:

[0183] Step 301. Determine the text information of the speech to be synthesized, the speaker and the style of the text information.

[0184] In this embodiment, the user selects the text information to be synthesized in the client, for example, the content in the novel, the content in the news, the content in the webpage, etc., can display ...

Embodiment 3

[0244] Figure 4 A structural block diagram of a speech synthesis device provided in Embodiment 3 of the present invention may specifically include the following modules:

[0245] The synthesized data determination module 401 is used to determine the text information of the voice to be synthesized, the speaker and the style of the text information;

[0246] A language information extraction module 402, configured to extract information representing linguistics from the text information as language information;

[0247] Synthesis system determination module 403, used to determine that the DurIAN network is an acoustic model, and the HiFi-GAN network is a vocoder;

[0248] A spectral feature generating module 404, configured to input the language information into the DurIAN network as an acoustic model, and convert it into a spectral feature conforming to the speaker speaking the text information in the style;

[0249] The speech signal generating module 405 is configured to i...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The embodiment of the invention provides a speech synthesis method and device, computer equipment and a storage medium. The method comprises steps of determining the text information of speech to be synthesized, a speaker speaking out the text information and a style, extracting information representing linguistics from the text information, taking the information as language information, determining a DurIAN network as an acoustic model, determining a HiFi-GAN network as a vocoder, Language information is input into a DurIAN network serving as an acoustic model and converted into frequency spectrum features conforming to text information speaking by a speaker in a style, the frequency spectrum features are inputted into a HiFi-GAN network serving as a vocoder and converted into voice signals conforming to text information speaking by the speaker in the style, and the DurIAN network and the HiFi-GAN network are combined to be used in a TTS. Under the condition of low resources, robustness of the cloned timbre can be ensured, and naturalness of speech synthesis and similarity of the timbre of a speaker serving as a cloning target are improved.

Description

technical field [0001] The embodiments of the present invention relate to the technical field of speech processing, and in particular to a speech synthesis method, device, computer equipment and storage medium. Background technique [0002] TTS (Text To Speech, text-to-speech) aims to convert text into speech, which is part of the human-machine dialogue, allowing the machine to speak. In recent years, with the rapid development of acoustic models and vocoder technology, TTS has been used in voice assistant , audiobooks, and spoken dialogue systems play an important role in many fields. [0003] TTS can generate natural speech for speakers with a large number of high-quality speech, which can almost be faked. However, TTS is still limited by the ideal situation where the training set is a large number of single speakers and the expressiveness is not rich enough. [0004] In the case of low resources, especially when the speaker's language samples are scarce and the recording...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): G10L13/08G10L25/30

CPCG10L13/08G10L25/30

Inventor 康世胤刘峰陀得意游于人王洁吴志勇

Owner GUANGZHOU HUYA TECH CO LTD

Speech synthesis method and device, computer equipment and storage medium

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment 1

Embodiment 2

Embodiment 3

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology