Check patentability & draft patents in minutes with Patsnap Eureka AI!

A conversion method and device capable of converting any voice into multiple voices

A technology of voice conversion and conversion method, which is applied in voice analysis, voice recognition, instruments, etc., can solve the problems of relying on the accuracy of voice recognition, etc., and achieve the effect of improving the processing effect

Active Publication Date: 2021-11-19
成都启英泰伦科技有限公司
View PDF16 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] The method based on confrontational learning can achieve better results in the training set conversion, but the disadvantage is that it can only convert the audio of the people in the training set
The method based on the speech recognition model can realize the conversion of any timbre, but depends on the accuracy of speech recognition.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A conversion method and device capable of converting any voice into multiple voices
  • A conversion method and device capable of converting any voice into multiple voices
  • A conversion method and device capable of converting any voice into multiple voices

Examples

Experimental program
Comparison scheme
Effect test

preparation example Construction

[0127] The synthetic method comprises the following steps:

[0128] S9. Extract the ppg feature of the converted audio, and send it to the first preprocessing network;

[0129] S10. Extract the fundamental frequency logarithmic feature of the converted audio and the target speaker's arbitrary audio, calculate the mean and variance, and perform linear mapping according to the formula to obtain the mapped feature lf0':

[0130] (5)

[0131] Among them, lf0 s is the logarithmic characteristic of the fundamental frequency of the converted audio, μ s is the mean value of the logarithmic characteristic of the fundamental frequency of the converted audio, μ t is the mean value of the logarithmic feature of the base frequency of the target speaker, σ s is the logarithmic characteristic variance of the fundamental frequency of the converted audio, σ t is the variance of the logarithmic feature of the base frequency of the target speaker;

[0132] S11. The feature lf0' after ma...

specific Embodiment

[0154] Prepare training corpus, multi-person Chinese audio data, multi-person English data, and mark the speaker number.

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

A conversion method and device capable of converting any voice into multiple voices, said conversion method comprising the following steps: preparing the corpus of a plurality of target speakers as a training corpus; extracting the ppg features of each training corpus; obtaining comprehensive features; obtaining The speaker encoding features of the target speaker in the training set, the mean simulation feature γ and the variance simulation feature β are obtained; the training can convert the comprehensive feature into a conversion model of Mel features; the mean simulation feature γ and the variance simulation feature β are used as the conversion model style Input, the comprehensive feature is input as the content of the conversion model, and the Mel spectrum of different speakers is decoded to realize the synthesis of different sounds. The invention can better decouple speech content information, and reduce the impact of inaccurate ppg features extracted by a speech recognition model on speech conversion.

Description

technical field [0001] The invention belongs to the technical field of speech synthesis, and in particular relates to a conversion method and device capable of converting any speech into multiple speeches. Background technique [0002] Speech conversion technology is a technology that converts the source voice data into the voice data of the specified speaker and keeps the pronunciation content consistent. The traditional voice change technology processes the voice signal, adjusts the audio pitch, speech rate, etc. to turn the original audio into a machine voice. , the conversion mode is single; different from the traditional voice change technology, the voice conversion technology can control the emotion, rhythm and other information of the target voice while ensuring the consistency of the pronunciation content. Voice conversion technology can be used in scenarios such as virtual anchor, voice reshaping, rhythm / emotion conversion, and voice style conversion. [0003] Acco...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G10L15/02G10L15/06G10L15/16
CPCG10L15/02G10L15/063G10L15/16
Inventor 曹艳艳陈佩云高君效
Owner 成都启英泰伦科技有限公司
Features
  • R&D
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More