Voice synthesis method and device and electronic equipment

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A technology of speech synthesis and speech library, applied in speech synthesis, speech analysis, instruments, etc., can solve the problems of strong sound quality, poor stability outside the set, loss of sound quality and tone details, etc., to improve the effect and increase efficiency Effect

Active Publication Date: 2019-07-23

BEIJING SINOVOICE TECH CO LTD

View PDF15 Cites 1 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

The two have their own strengths and weaknesses, and they cannot completely replace each other: the sound selection and splicing synthesis have realistic sound quality and real sound length, but the splicing flaws are obvious and the stability outside the set is poor; the statistical parameter synthesis is stable and the synergistic pronunciation is smooth, but the sound quality is strong. Length averaging

After all, the reason why parametric synthesis can better balance the fitting degree inside and outside the set, and the softness of synergistic pronunciation is at the cost of "melting and flattening" the individuality of samples in the set, and the details of sound quality and tone will be lost

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment 1

[0067] figure 1 It is a flowchart of steps of a speech synthesis method provided by an embodiment of the present invention.

[0068] refer to figure 1 As shown, the speech synthesis method provided by this embodiment is applied to electronic devices such as electronic computers or speech synthesis equipment, and specifically includes the following steps:

[0069] S1. Perform text analysis on the input text.

[0070] When the user directly inputs or other electronic equipment inputs the corresponding text, text analysis is performed on the input text, and the target primitive sequence and corresponding context information are obtained from it. The target primitive sequence here includes multiple target primitives.

[0071] S2. Utilize the traditional model decision tree to determine the subcategory number and the corresponding Gaussian distribution model respectively described in the voice selection target model of the contextual information in the speech bank.

[0072] The...

Embodiment 2

[0150] figure 2 It is a structural block diagram of a speech synthesis device provided by an embodiment of the present invention.

[0151] refer to figure 2 As shown, the speech synthesis device provided by this embodiment is applied to electronic equipment such as electronic computers or speech synthesis equipment, and specifically includes a text analysis module 10, a first calculation module 20, a distance calculation module 30, a grid construction module 40, a second Calculation module 50 , third calculation module 60 , fourth calculation module 70 , path selection module 80 and splicing output module 90 .

[0152] The text analysis module is used to perform text analysis on the input text.

[0153] When the user directly inputs or other electronic equipment inputs the corresponding text, text analysis is performed on the input text, and the target primitive sequence and corresponding context information are obtained from it. The target primitive sequence here include...

Embodiment 3

[0186] This embodiment provides an electronic device, such as a speech synthesis device, an electronic computer or a mobile terminal, etc., which is provided with the speech synthesis device provided in the previous embodiment. The device is used to perform text analysis on the input text to obtain the target primitive sequence and the corresponding context information; for the context information, the traditional model decision tree is used to determine the context information in the voice selection target model of the speech library. The subclass number and the corresponding Gaussian distribution model are used to obtain the corresponding pre-selection results; the pre-selection results are used to form a column for each target primitive in turn, and finally the target primitive sequence forms a set of candidate grids; the context information is input into the deep learning model to obtain the acoustic parameter envelope, primitive duration, and boundary frame acoustic parame...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The embodiment of the invention provides a voice synthesis method and device and electronic equipment. According to the technical scheme, on a voice selection splicing synthesis path, a depth learningtechnology is moderately introduced, but a traditional statistical learning technology is not thoroughly abandoned, and thus the advantages of both parties are made use of; the key innovation is thata depth learning model is adopted for generating simulation data to conduct back feeding on training of a traditional statistical learning model, the effect of traditional learning is improved from two aspects of algorithms and data, and thus the effect of voice synthesis is improved.

Description

technical field [0001] The present invention relates to the technical field of speech synthesis, in particular to a speech synthesis method, device and electronic equipment. Background technique [0002] In recent years, as the wave of deep learning has swept across the related fields of machine learning, the field of speech synthesis has also been surging. From acoustic parameter modeling, speech enhancement, vocoder, to prosody analysis and other text preprocessing links, they have tried to apply State-of-the-art deep learning techniques, or even attempts to model "end-to-end" directly from text to waveforms, have achieved impressive results. [0003] In the past ten years of development in the field of speech synthesis, the contention between the two routes of statistical parameter synthesis and voice selection splicing synthesis has been maintained. The two have their own strengths and weaknesses, and they cannot completely replace each other: the sound selection and sp...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G10L13/02G10L13/08G10L25/30

CPCG10L13/02G10L13/08G10L25/30

Inventor 王愈李健张连毅武卫东

Owner BEIJING SINOVOICE TECH CO LTD

Voice synthesis method and device and electronic equipment

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment 1

Embodiment 2

Embodiment 3

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology