Speech synthesis method, system and device and storage medium

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A technology of speech synthesis and speech synthesis, applied in speech synthesis, speech analysis, instruments, etc., can solve the problem of low synthesis efficiency, and achieve the effect of ensuring the quality of speech synthesis and improving the efficiency of speech synthesis

Active Publication Date: 2021-09-17

ALIBABA GRP HLDG LTD

View PDF5 Cites 0 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

However, the existing LPCNet has a certain degree of computational redundancy, and the synthesis efficiency is low

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

preparation example Construction

[0047] The embodiment of the present application also provides a speech synthesis method based on a neural network combined with linear predictive coding. Such as Figure 2a As shown, the method includes:

[0048] 21a. Acquire acoustic features of the text to be synthesized on multiple channels, where different channels correspond to different acoustic frequency bands.

[0049] 22a. Using a neural network combined with linear predictive coding to predict acoustic features on multiple channels, and obtain linear predictive parameters and nonlinear residuals on multiple channels.

[0050] 23a. Perform speech synthesis according to the linear prediction parameters and nonlinear residuals on multiple channels, to obtain synthesized speech corresponding to the text to be synthesized.

[0051] In this embodiment, a multi-channel speech synthesis method is adopted, which can improve speech synthesis efficiency compared with a single-channel speech synthesis method; further, the lin...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The embodiment of the invention provides a speech synthesis method, system and device and a storage medium. In the embodiment of the invention, a multi-channel linear prediction network vocoder is provided, multi-channel input is supported, acoustic features of a to-be-synthesized text on multiple channels are acquired, and a voice signal corresponding to the to-be-synthesized text can be synthesized by using the multi-channel linear prediction network vocoder; and the speech synthesis based on linear prediction can ensure the speech synthesis quality, and at the same time, the speech synthesis efficiency can be improved by means of the advantages of multiple channels.

Description

technical field [0001] The present application relates to the technical field of speech signal processing, and in particular to a speech synthesis method, system, device and storage medium. Background technique [0002] Speech synthesis, also known as text-to-speech (Text to Speech) technology, is a technology that generates artificial voice through mechanical and electronic methods. In the process of speech synthesis, the front-end and middle-end are responsible for predicting the compressed features of the speech from the text, such as Mel-Frequency Cepstral Coefficients (Mel-Frequency Cepstral Coefficients, MFCC), etc.; Vocoder (vocoder) to complete. [0003] Linear Predictive Coding Net (LPCNet) vocoder is a variant model of WaveRNN that combines Recurrent Neural Network (RNN) and linear prediction, which combines deep learning and digital signal processing technology , which greatly improves the quality of speech synthesis, so it is widely used in speech synthesis sys...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G10L13/02G10L13/04G10L13/08G10L19/04G10L19/16G10L25/03G10L25/30

CPCG10L13/02G10L13/04G10L13/08G10L19/16G10L19/04G10L25/03G10L25/30

Inventor 杨辰雨雷鸣

Owner ALIBABA GRP HLDG LTD

Speech synthesis method, system and device and storage medium

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

preparation example Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology