Speech synthesis method and device, computer equipment and storage medium

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A technology of speech synthesis and speech, applied in speech synthesis, speech analysis, instruments, etc., to achieve the effect of reducing the difficulty of training and appropriate amount of data

Pending Publication Date: 2021-08-31

GUANGZHOU HUYA TECH CO LTD +1

View PDF0 Cites 2 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0004] The embodiment of the present invention proposes a speech synthesis method, device, computer equipment and storage medium to solve the problem of how to clone the timbre for speech synthesis without seeing the timbre

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment 1

[0030] figure 1 It is a flowchart of a speech synthesis method provided by Embodiment 1 of the present invention. This embodiment is applicable to the situation of training an acoustic model in a speech synthesizer without seeing timbre, and the method can be executed by a speech synthesis device , the speech synthesis device can be implemented by software and / or hardware, and can be configured in computer equipment, such as servers, workstations, personal computers, etc., specifically including the following steps:

[0031] Step 101. Obtain a sample speech signal, sample text information expressing the content of the sample speech signal, and sample spectral features converted from the sample speech signal.

[0032] The TTS model of traditional cross-language synthesis usually uses a small number of speakers (such as several or more than a dozen speakers). The structure of the speech synthesizer of this embodiment is more robust and can support the use of large-scale multi-sp...

Embodiment 2

[0124] image 3 It is a flow chart of a speech synthesis method provided by Embodiment 1 of the present invention. This embodiment is applicable to the situation where an application uses a speech synthesizer to perform speech synthesis across languages. The method can be executed by a speech synthesis device, and the speech synthesis device It can be implemented by software and / or hardware, and can be configured in computer equipment, such as servers, workstations, personal computers, mobile terminals (such as mobile phones, tablet computers, smart wearable devices, etc.), etc., specifically including the following steps:

[0125] Step 301. Receive a reference speech signal belonging to a non-target language and target text information belonging to a target language.

[0126] In this embodiment, the operating systems in the computer equipment include Windows, Android, iOS, etc., and these operating systems can support clients running speech synthesis, for example, nove...

Embodiment 3

[0157] Figure 4 A structural block diagram of a speech synthesis device provided in Embodiment 3 of the present invention may specifically include the following modules:

[0158] Synthetic information receiving module 401, is used for receiving the reference speech signal that belongs to non-target language, the target text information that belongs to target language;

[0159] The target timbre extraction module 402 is used to identify the characteristics of the timbre in the reference speech signal as the target timbre;

[0160] A speech synthesizer determining module 403, configured to determine a speech synthesizer trained for the target language, the speech synthesizer including an acoustic model and a vocoder;

[0161] A target spectral feature generating module 404, configured to convert the target text information into spectral features belonging to the target language and conforming to the target timbre in the acoustic model, as target spectral features; ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The embodiment of the invention provides a speech synthesis method and device, computer equipment and a storage medium. The method comprises steps that in the embodiment, a reference speech signal belonging to a non-target language and the target text information belonging to a target language are received, a feature representing a tone in the reference speech signal is recognized, and the feature is made to serve as a target tone; a voice synthesizer trained for the target language is determined, the voice synthesizer comprises an acoustic model and a vocoder, in the acoustic model, the target text information is converted into frequency spectrum features which belong to the target language and conform to the target timbre, in the acoustic model, the frequency spectrum features are used as target frequency spectrum features, and in the vocoder, the target spectrum feature is converted into the target voice signal belonging to the target language, the timbre of the reference voice signal of the non-target language is not used for training a voice synthesizer for the target language, and timbre cloning without a speaker can be realized in a scene of realizing cross-language voice synthesis.

Description

technical field [0001] The embodiments of the present invention relate to the technical field of speech processing, and in particular to a speech synthesis method, device, computer equipment and storage medium. Background technique [0002] TTS (Text To Speech, text-to-speech) aims to convert text into speech, which is part of the human-machine dialogue, allowing the machine to speak. In recent years, with the rapid development of acoustic models and vocoder technology, TTS has been used in voice assistant , audiobooks, and spoken dialogue systems play an important role in many fields. [0003] TTS can generate natural speech for speakers with a large number of high-quality voices, which can almost be faked. At present, TTS is limited to the training set and clones the timbre of the trained speaker. However, the timbre of the speaker is difficult to obtain, especially In the cross-language TTS scene, it is difficult to collect the timbre of the speakers, and the timbres of ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): G10L13/08G10L25/30

CPCG10L13/08G10L25/30

Inventor 户建坤康世胤吴志勇陈学源刘峰

Owner GUANGZHOU HUYA TECH CO LTD

Speech synthesis method and device, computer equipment and storage medium

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment 1

Embodiment 2

Embodiment 3

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology