Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

A kind of speech synthesis method, device and electronic equipment based on timbre clone

A technology of speech synthesis and speech, applied in speech synthesis, speech analysis, instruments, etc., can solve the problems of high economic cost and time-consuming variable speech, and achieve the effect of reducing recording time and saving recording cost

Active Publication Date: 2021-02-02
BEIJING QIYU INFORMATION TECH CO LTD
View PDF5 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] The present invention aims to solve the technical problem of time-consuming and high economic cost of synthesizing the variable voice of the target timbre

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A kind of speech synthesis method, device and electronic equipment based on timbre clone
  • A kind of speech synthesis method, device and electronic equipment based on timbre clone
  • A kind of speech synthesis method, device and electronic equipment based on timbre clone

Examples

Experimental program
Comparison scheme
Effect test

preparation example Construction

[0062] see figure 1 , figure 1 It is a flowchart of a speech synthesis method based on timbre cloning provided by the present invention, such as figure 1 As shown, the method includes:

[0063] S1. Training the TTS basic model through open source corpus;

[0064] Among them, TTS is a technology for converting text into sound, mainly including: front-end processing, creation of TTS model and vocoder (vocoder). Front-end processing is for corpus in the form of text, which converts any text into linguistic features, usually including text regularization, word segmentation, part-of-speech prediction, grapheme-to-phoneme, polyphone disambiguation, prosody Estimation and other submodules. Text regularization can convert some written expressions into spoken expressions, such as 1% into "one hundredth", 1kg into "one kilogram", etc. Word segmentation and part-of-speech prediction are the basis of prosody prediction (ProsodyPrediction). Glyph-to-phonetic conversion of speech into...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a voice synthesis method and device based on tone cloning and electronic equipment. The method comprises the steps: training a TTS basic model through an open source corpus; training the basic model through a target tone corpus in a fine tuning manner to obtain a fine tuning model; generating a variable voice of a target tone according to a variable corpus and the fine tuning model; and synthesizing a target tone voice according to the variable voice and the fixed voice. According to the method, the open source corpus is firstly adopted, and then the fine tuning model of the TTS network is trained through a small amount of target tone corpus in a fine tuning manner. Compared with exhaustive manual recording or a traditional TTS high-quality corpus synthesis mode, the method has the advantages that the recording time of the target timbre corpus can be effectively shortened, and the recording cost is greatly reduced.

Description

technical field [0001] The present invention relates to the technical field of voice intelligence, in particular to a voice synthesis method, device, electronic equipment and computer-readable medium based on timbre cloning. Background technique [0002] In the process of intelligent voice interaction, voice robots usually use preset words to interact with users. Among them, the preset words are generally synthesized by fixed voice and variable voice. A fixed voice is a voice common to all users, and a variable voice is a voice that needs to be changed for a single user. For example, in the default script "Hello! Mr. xx.", "Hello" and "Mr." can be used by all male users, which are fixed voices; while "xx" needs to be voiced according to the name of each male user. change, and thus belong to variable speech. [0003] In the prior art, the fixed voice is pre-recorded by a professional sound engineer, and a generation method of the variable voice is to reduce the variable vo...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G10L13/02G10L13/04G10L13/033
CPCG10L13/02G10L13/033G10L13/04
Inventor 张彤彤
Owner BEIJING QIYU INFORMATION TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products