A conversion method and device capable of converting any voice into multiple voices

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A technology of voice conversion and conversion method, which is applied in voice analysis, voice recognition, instruments, etc., can solve the problems of relying on the accuracy of voice recognition, etc., and achieve the effect of improving the processing effect

Active Publication Date: 2021-11-19

成都启英泰伦科技有限公司

View PDF16 Cites 0 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0004] The method based on confrontational learning can achieve better results in the training set conversion, but the disadvantage is that it can only convert the audio of the people in the training set

The method based on the speech recognition model can realize the conversion of any timbre, but depends on the accuracy of speech recognition.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

preparation example Construction

[0127] The synthetic method comprises the following steps:

[0128] S9. Extract the ppg feature of the converted audio, and send it to the first preprocessing network;

[0129] S10. Extract the fundamental frequency logarithmic feature of the converted audio and the target speaker's arbitrary audio, calculate the mean and variance, and perform linear mapping according to the formula to obtain the mapped feature lf0':

[0130] (5)

[0131] Among them, lf0 s is the logarithmic characteristic of the fundamental frequency of the converted audio, μ s is the mean value of the logarithmic characteristic of the fundamental frequency of the converted audio, μ t is the mean value of the logarithmic feature of the base frequency of the target speaker, σ s is the logarithmic characteristic variance of the fundamental frequency of the converted audio, σ t is the variance of the logarithmic feature of the base frequency of the target speaker;

[0132] S11. The feature lf0' after ma...

specific Embodiment

[0154] Prepare training corpus, multi-person Chinese audio data, multi-person English data, and mark the speaker number.

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

A conversion method and device capable of converting any voice into multiple voices, said conversion method comprising the following steps: preparing the corpus of a plurality of target speakers as a training corpus; extracting the ppg features of each training corpus; obtaining comprehensive features; obtaining The speaker encoding features of the target speaker in the training set, the mean simulation feature γ and the variance simulation feature β are obtained; the training can convert the comprehensive feature into a conversion model of Mel features; the mean simulation feature γ and the variance simulation feature β are used as the conversion model style Input, the comprehensive feature is input as the content of the conversion model, and the Mel spectrum of different speakers is decoded to realize the synthesis of different sounds. The invention can better decouple speech content information, and reduce the impact of inaccurate ppg features extracted by a speech recognition model on speech conversion.

Description

technical field [0001] The invention belongs to the technical field of speech synthesis, and in particular relates to a conversion method and device capable of converting any speech into multiple speeches. Background technique [0002] Speech conversion technology is a technology that converts the source voice data into the voice data of the specified speaker and keeps the pronunciation content consistent. The traditional voice change technology processes the voice signal, adjusts the audio pitch, speech rate, etc. to turn the original audio into a machine voice. , the conversion mode is single; different from the traditional voice change technology, the voice conversion technology can control the emotion, rhythm and other information of the target voice while ensuring the consistency of the pronunciation content. Voice conversion technology can be used in scenarios such as virtual anchor, voice reshaping, rhythm / emotion conversion, and voice style conversion. [0003] Acco...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Patents(China)

IPC IPC(8): G10L15/02G10L15/06G10L15/16

CPCG10L15/02G10L15/063G10L15/16

Inventor 曹艳艳陈佩云高君效

Owner 成都启英泰伦科技有限公司

A conversion method and device capable of converting any voice into multiple voices

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

Method used

Image

Examples

preparation example Construction

specific Embodiment

PUM

Abstract

Description

Claims

Application Information

Agents

Company

A conversion method and device capable of converting any voice into multiple voices

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

preparation example Construction

specific Embodiment

PUM

Abstract

Description

Claims

Application Information

Agents

Company

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology