Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Voice cloning method and device based on single-speaker speech synthesis data set

A speech synthesis and speech data technology, applied in speech synthesis, speech analysis, speech recognition, etc., can solve the problems of high cost, high cost, and inability to label speech synthesis data sets.

Active Publication Date: 2020-07-07
杭州博盾习言科技有限公司
View PDF9 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] Existing voice cloning techniques have been proven capable of generating high-quality speech based on speech synthesis data from a large number of speakers, but a set of speech synthesis datasets often requires a large number of speakers
Speech synthesis datasets require clean sound and no obvious background noise, and are generally produced from recording studios. At present, neither free nor commercial speech synthesis datasets can meet such requirements.
Even if it takes resources to produce such a data set, the cost of labeling the speech synthesis data set will be very large, and the cost is extremely high

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Voice cloning method and device based on single-speaker speech synthesis data set
  • Voice cloning method and device based on single-speaker speech synthesis data set
  • Voice cloning method and device based on single-speaker speech synthesis data set

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0046] Embodiment 1 provides a sound cloning method based on a single-speaker speech synthesis data set, aiming at training a single-speaker speech synthesis model through the speaker's speech synthesis data set. Human speech synthesis model, voiceprint model and voice conversion model are calculated, and the voice of the target text can be obtained in the voice of the target speaker. This method only needs a single-speaker speech synthesis data set to clone the voice of the target speaker. The processing of single-speaker speech synthesis data is simple and convenient, without the need to collect and process a large number of speaker speech synthesis data. Greatly reduce manpower, time and capital costs.

[0047] Please refer to figure 1 As shown, a voice cloning method based on a single-speaker speech synthesis dataset, including the following steps:

[0048] S110. Acquire a single-speaker speech synthesis data set, and train a single-speaker speech synthesis model based o...

Embodiment 2

[0071] Embodiment 2 discloses a voice cloning device based on a single-speaker speech synthesis data set corresponding to the above embodiment, which is the virtual device structure of the above embodiment, please refer to figure 2 shown, including:

[0072] The speech synthesis module 210 is used to obtain a single-speaker speech synthesis data set, and train a single-speaker speech synthesis model based on the single-speaker speech synthesis data set;

[0073] The voiceprint module 220 is used to obtain a multi-speaker voice data set, and train a voiceprint model based on the multi-speaker voice data set;

[0074] The voice conversion module 230 is used to calculate the voiceprint model through the training of the multi-speaker voice data set to obtain a voiceprint feature data set, and train the voice conversion model based on the voiceprint feature data set;

[0075] The sound cloning module 240 is used to obtain the target text and the target speaker's voice, the target...

Embodiment 3

[0078] image 3 A schematic structural diagram of an electronic device provided in Embodiment 3 of the present invention, such as image 3 As shown, the electronic device includes a processor 310, a memory 320, an input device 330, and an output device 340; the number of processors 310 in a computer device may be one or more, image 3 Take a processor 310 as an example; the processor 310, memory 320, input device 330 and output device 340 in the electronic device can be connected by bus or other methods, image 3 Take connection via bus as an example.

[0079] The memory 320, as a computer-readable storage medium, can be used to store software programs, computer-executable programs and modules, such as the program instructions / modules corresponding to the voice cloning method based on the single-speaker speech synthesis data set in the embodiment of the present invention ( For example, the voice synthesis module 210, the voiceprint module 220, the voice conversion module 230...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a sound cloning method, device, electronic equipment and computer storage medium based on a single-speaker speech synthesis data set, and relates to the technical field of sound cloning. The method includes the following steps: based on a single-speaker speech synthesis data set, training Single-speaker speech synthesis model; training voiceprint model based on multi-speaker voice data set; multi-speaker voice data set is calculated through the trained voiceprint model to obtain voiceprint feature data set, and train voice based on voiceprint feature data set Conversion model: the target text and the target speaker's voice are calculated through the trained single-speaker speech synthesis model, voiceprint model, and voice conversion model to obtain the target speaker's text and voice. This method only needs a set of single-speaker speech synthesis data sets to realize the cloning of the target speaker's voice. The processing of speech synthesis data is simple and convenient, and there is no need to collect and process speech synthesis data of a large number of speakers, which greatly reduces various costs. .

Description

technical field [0001] The present invention relates to the technical field of voice cloning, in particular to a voice cloning method, device, electronic equipment and storage medium based on a single-speaker speech synthesis data set. Background technique [0002] With the development of speech technology, people put forward higher requirements for output audio, and hope that the audio generated by text input sounds like the voice of a specific speaker. Through the sound cloning technology, the requirement of this kind of personalized voice output can be met. The ultimate goal of voice cloning technology is to completely simulate someone's voice. [0003] Existing voice cloning techniques have been proven capable of generating high-quality speech based on speech synthesis data from a large number of speakers, but a set of speech synthesis datasets often requires a large number of speakers. Speech synthesis datasets require clean sound and no obvious background noise, and ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G10L13/02G10L13/08G10L15/06G10L15/16G10L17/02G10L17/04G10L17/18G10L19/16
CPCG10L13/02G10L13/08G10L15/063G10L15/16G10L17/02G10L17/04G10L17/18G10L19/16
Inventor 房树明朱鹏程燕鹏举王洪涛顾王一毕成
Owner 杭州博盾习言科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products