Speech synthesis method, device and equipment and storage medium

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A technology of speech synthesis and speech, which is applied in speech synthesis, speech analysis, instruments, etc., can solve the problems of lack of versatility, models that cannot be used by ordinary people, and speech synthesis methods that do not have versatility, so as to improve versatility , Save recording time, save time for recording voice effects

Active Publication Date: 2020-09-15

SOUNDAI TECH CO LTD

View PDF8 Cites 15 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0004] At the same time, the multi-speaker technology based on end-to-end TTS has also been greatly developed. On the basis of the existing end-to-end TTS, scholars add audio tags of multiple speakers to distinguish them, and then conduct multi-speaker training. According to the numbers of multiple speakers, which voice can be used to synthesize the current text and realize the flexible switching between different speakers, it has certain practical value, but there is a big limitation, that is, the model needs a large number of speakers. Speaker data, each speaker needs at least several hours of professionally recorded, high-quality voice data to ensure the quality and practicability of the model, and it is not universal. For ordinary people, there is usually no professional recording equipment And the recording environment, and the training process usually requires on-site supervision by a special person, and the recording is repeated repeatedly to ensure the recording quality. Ordinary people do not have enough time to record such high-quality training audio for a long time, which makes the model unable to be used by ordinary people. scope

[0005] To sum up, the speech synthesis methods in the prior art are not universal, and cannot meet the needs of ordinary people who do not have professional recording equipment and recording environments.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

preparation example Construction

[0049] Such as figure 1 As shown, the embodiment of the present invention provides a speech synthesis method, which may include the following steps:

[0050] In step 101, a voice broadcast instruction is received, and the voice broadcast instruction includes the voice broadcast text and the target object corresponding to the timbre used in the broadcast voice.

[0051] Step 102: Obtain a preset number of voice data of the target object collected in advance, and extract the voiceprint feature information of the target object by using the pre-trained voiceprint recognition model, which is based on the pre-collected voiceprint recognition model of multiple objects Voice data training generated, the preset number is less than the preset number threshold.

[0052] Step 103, using the pre-trained speech synthesis model, based on the speech broadcast text, the pre-trained voiceprint recognition model and the voiceprint feature information of the target object, synthesize the to-be-p...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The embodiment of the invention provides a voice synthesis method, a device and equipment and a storage medium, which are used for improving the universality of voice synthesis and meeting the use requirements of common users without professional recording equipment and recording environments. The voice synthesis method comprises the steps that a voice broadcast instruction is received, and the voice broadcast instruction comprises a voice broadcast text and a target object corresponding to the tone used by the broadcast voice; a preset number of pieces of pre-collected voice data of the target object is acquired, voiceprint feature information of the target object is extracted by using a pre-trained voiceprint recognition model, the pre-trained voiceprint recognition model is generated bytraining based on the pre-collected voice data of the plurality of objects, and the preset number is smaller than a preset number threshold; to-be-played voice of which the tone is the tone of the target object is synthesized by using a pre-trained voice synthesis model based on a voice broadcast text, a pre-trained voiceprint recognition model and the voiceprint feature information of the targetobject; and the synthesized voice to be played is played.

Description

technical field [0001] The present invention relates to the field of voice interaction, in particular to a voice synthesis method, device, equipment and storage medium. Background technique [0002] The vehicle-mounted voice interaction system has always attracted public attention. A good voice-interaction system can not only improve the safety awareness of drivers and passengers, but also make the vehicle environment more intelligent. At this stage, star timbres are very popular among in-vehicle voice navigation interactions. This kind of timbre brings more entertainment effects, but customized timbres may improve the driver's safety awareness. Voice navigation is often accompanied by safety reminders, such as "fasten your seat belt" and "slow down ahead". If you customize the navigation voice to your parents, partner or your own children, the driver will be more willing to listen to it even if it doesn't matter. These "friendly" safety reminders will also enhance the sens...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): G10L13/047G10L13/08G10L13/10G10L17/00G10L17/02G10L17/18

CPCG10L13/047G10L13/08G10L13/10G10L17/02G10L17/18Y02D30/70

Inventor 杜慷冯大航陈孝良

Owner SOUNDAI TECH CO LTD

Speech synthesis method, device and equipment and storage medium

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

preparation example Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology