Speech synthesis method and device, computer equipment and storage medium
A technology of speech synthesis and speech, applied in speech synthesis, speech analysis, instruments, etc., to achieve the effect of reducing the difficulty of training and appropriate amount of data
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0030] figure 1 It is a flowchart of a speech synthesis method provided by Embodiment 1 of the present invention. This embodiment is applicable to the situation of training an acoustic model in a speech synthesizer without seeing timbre, and the method can be executed by a speech synthesis device , the speech synthesis device can be implemented by software and / or hardware, and can be configured in computer equipment, such as servers, workstations, personal computers, etc., specifically including the following steps:
[0031] Step 101. Obtain a sample speech signal, sample text information expressing the content of the sample speech signal, and sample spectral features converted from the sample speech signal.
[0032] The TTS model of traditional cross-language synthesis usually uses a small number of speakers (such as several or more than a dozen speakers). The structure of the speech synthesizer of this embodiment is more robust and can support the use of large-scale multi-sp...
Embodiment 2
[0124] image 3 It is a flow chart of a speech synthesis method provided by Embodiment 1 of the present invention. This embodiment is applicable to the situation where an application uses a speech synthesizer to perform speech synthesis across languages. The method can be executed by a speech synthesis device, and the speech synthesis device It can be implemented by software and / or hardware, and can be configured in computer equipment, such as servers, workstations, personal computers, mobile terminals (such as mobile phones, tablet computers, smart wearable devices, etc.), etc., specifically including the following steps:
[0125] Step 301. Receive a reference speech signal belonging to a non-target language and target text information belonging to a target language.
[0126] In this embodiment, the operating systems in the computer equipment include Windows, Android, iOS, etc., and these operating systems can support clients running speech synthesis, for example, nove...
Embodiment 3
[0157] Figure 4 A structural block diagram of a speech synthesis device provided in Embodiment 3 of the present invention may specifically include the following modules:
[0158] Synthetic information receiving module 401, is used for receiving the reference speech signal that belongs to non-target language, the target text information that belongs to target language;
[0159] The target timbre extraction module 402 is used to identify the characteristics of the timbre in the reference speech signal as the target timbre;
[0160] A speech synthesizer determining module 403, configured to determine a speech synthesizer trained for the target language, the speech synthesizer including an acoustic model and a vocoder;
[0161] A target spectral feature generating module 404, configured to convert the target text information into spectral features belonging to the target language and conforming to the target timbre in the acoustic model, as target spectral features; ...
PUM
Login to View More Abstract
Description
Claims
Application Information
Login to View More 


