A kind of speech synthesis method, device and electronic equipment based on timbre clone

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
A technology of speech synthesis and speech, applied in speech synthesis, speech analysis, instruments, etc., can solve the problems of high economic cost and time-consuming variable speech, and achieve the effect of reducing recording time and saving recording cost

Active Publication Date: 2021-02-02

BEIJING QIYU INFORMATION TECH CO LTD

View PDF5 Cites 2 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0004] The present invention aims to solve the technical problem of time-consuming and high economic cost of synthesizing the variable voice of the target timbre

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

preparation example Construction

[0062] see figure 1 , figure 1 It is a flowchart of a speech synthesis method based on timbre cloning provided by the present invention, such as figure 1 As shown, the method includes:

[0063] S1. Training the TTS basic model through open source corpus;

[0064] Among them, TTS is a technology for converting text into sound, mainly including: front-end processing, creation of TTS model and vocoder (vocoder). Front-end processing is for corpus in the form of text, which converts any text into linguistic features, usually including text regularization, word segmentation, part-of-speech prediction, grapheme-to-phoneme, polyphone disambiguation, prosody Estimation and other submodules. Text regularization can convert some written expressions into spoken expressions, such as 1% into "one hundredth", 1kg into "one kilogram", etc. Word segmentation and part-of-speech prediction are the basis of prosody prediction (ProsodyPrediction). Glyph-to-phonetic conversion of speech into...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a voice synthesis method and device based on tone cloning and electronic equipment. The method comprises the steps: training a TTS basic model through an open source corpus; training the basic model through a target tone corpus in a fine tuning manner to obtain a fine tuning model; generating a variable voice of a target tone according to a variable corpus and the fine tuning model; and synthesizing a target tone voice according to the variable voice and the fixed voice. According to the method, the open source corpus is firstly adopted, and then the fine tuning model of the TTS network is trained through a small amount of target tone corpus in a fine tuning manner. Compared with exhaustive manual recording or a traditional TTS high-quality corpus synthesis mode, the method has the advantages that the recording time of the target timbre corpus can be effectively shortened, and the recording cost is greatly reduced.

Description

technical field [0001] The present invention relates to the technical field of voice intelligence, in particular to a voice synthesis method, device, electronic equipment and computer-readable medium based on timbre cloning. Background technique [0002] In the process of intelligent voice interaction, voice robots usually use preset words to interact with users. Among them, the preset words are generally synthesized by fixed voice and variable voice. A fixed voice is a voice common to all users, and a variable voice is a voice that needs to be changed for a single user. For example, in the default script "Hello! Mr. xx.", "Hello" and "Mr." can be used by all male users, which are fixed voices; while "xx" needs to be voiced according to the name of each male user. change, and thus belong to variable speech. [0003] In the prior art, the fixed voice is pre-recorded by a professional sound engineer, and a generation method of the variable voice is to reduce the variable vo...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & AuthorityPatents(China)

IPC IPC(8): G10L13/02G10L13/04G10L13/033

CPCG10L13/02G10L13/033G10L13/04

Inventor张彤彤

OwnerBEIJING QIYU INFORMATION TECH CO LTD

A kind of speech synthesis method, device and electronic equipment based on timbre clone

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

preparation example Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology