Speech synthesis method and device, electronic equipment and program product

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A technology for speech synthesis and speech data, which is applied in the field of devices, electronic equipment and program products, and speech synthesis methods, and can solve the problems of large computing time and computing resources, unfavorable applications, complexity of LPCNet vocoder and large amount of calculation, etc.

Active Publication Date: 2021-06-11

BEIJING DIDI INFINITY TECH & DEV

View PDF4 Cites 2 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0004] This method can effectively guarantee the voice quality of speech synthesis, but the complexity and calculation amount of the LPCNet vocoder are still relatively large, which also makes the computing time and computing resources required for speech synthesis more, which is not conducive to practical applications.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

preparation example Construction

[0031] The speech synthesis method provided by the embodiments of the present disclosure can be applied to figure 1 It is a schematic diagram of the system architecture based on the network of the present disclosure. like figure 1 As shown, the network system includes: a speech synthesis device 1 and an electronic device 2 .

[0032] Among them, the speech synthesis device 1 described in the present disclosure can be installed or integrated in the electronic device 1, and the electronic device 1 can specifically be a smart terminal, such as a smart phone, a tablet computer, a desktop computer, etc., which can perform data calculations according to preset calculation logics processing equipment.

[0033] Wherein, the electronic device 2 can analyze and obtain corresponding acoustic feature data and feature sampling data corresponding to the acoustic feature data by acquiring the voice text to be synthesized from the network. Then, the speech synthesis device 1 will acquire t...

Embodiment 1

[0110] Embodiment 1. A speech synthesis method, comprising:

[0111] Acquiring characteristic sampling data of the acoustic characteristic data at multiple sampling moments;

[0112] Using a speech synthesis network to simultaneously perform prediction processing on the feature sampling data at the plurality of sampling moments, and obtain linear prediction data and nonlinear prediction data at any two target sampling moments in the plurality of sampling moments;

[0113] The speech synthesis data at the two target sampling moments are determined according to the linear prediction data and the nonlinear prediction data at the two target sampling moments.

Embodiment 2

[0114] Embodiment 2, according to the speech synthesis method described in Embodiment 1, the speech synthesis network is used to predict the feature sampling data at the multiple sampling moments at the same time, and obtain any two target samples in the multiple sampling moments The linear prediction data and nonlinear prediction data of time, including:

[0115] Perform linear prediction processing on the feature sampling data at multiple sampling moments, and obtain the linear speech data Pm at the mth sampling moment and the linear speech data Pm+1 at the m+1st sampling moment;

[0116] Obtain the speech synthesis data Sm-1 and nonlinear speech data Em-1 at the m-1 sampling moment, and the speech synthesis data Sm-2 and nonlinear speech data Em-2 at the m-2 sampling moment;

[0117] For the feature sampling data, speech synthesis data Sm-1, nonlinear speech data Em-1, speech synthesis data Sm-2, nonlinear speech data Em-2, linear The speech data Pm and the linear speech d...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The embodiment of the invention provides a speech synthesis method and device, electronic equipment and a program product. The method comprises the following steps of acquiring feature sampling data of acoustic feature data at a plurality of sampling moments; performing prediction processing on the feature sampling data of a plurality of sampling moments at the same time by using a speech synthesis network to obtain linear prediction data and nonlinear prediction data of any two target sampling moments in a plurality of sampling moments; and according to the linear prediction data and the non-linear prediction data of the two target sampling moments, determining speech synthesis data of the two target sampling moments. According to the speech synthesis method provided by the invention, prediction processing can be simultaneously performed on two adjacent target sampling moments in a plurality of sampling moments of acoustic feature data; the speech synthesis data of the two target sampling moments are obtained, and thereby the real-time rate of speech synthesis is greatly improved while ensuring the speech synthesis quality.

Description

technical field [0001] The embodiments of the present disclosure relate to stream media data processing technologies, and in particular to a speech synthesis method, device, electronic equipment and program product. Background technique [0002] In speech technology, the quality of the vocoder will determine the quality of its synthesized speech. With the development of deep learning technology, it is possible to use the neural network in deep learning technology to improve the quality of the vocoder. [0003] LPCNet vocoder is a vocoder that combines neural network with Linear Predictive Coding (LPC). Based on WaveRNN network, it decomposes the sampling value into two parts, linear and nonlinear, to The linear part is output through the linear prediction, and the nonlinear part is given through the neural network, so as to realize the acquisition of the sampling value in the vocoder. [0004] Such a method can effectively guarantee the speech quality of speech synthesis, ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): G10L13/047G10L25/30G10L19/04G10L19/16

CPCG10L13/047G10L25/30G10L19/04G10L19/16

Inventor 文成郭庭炜

Owner BEIJING DIDI INFINITY TECH & DEV

Speech synthesis method and device, electronic equipment and program product

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

preparation example Construction

Embodiment 1

Embodiment 2

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology