Voice style conversion method and device, equipment and storage medium

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
A technology of style conversion and conversion method, which is applied in speech analysis, instruments, etc., and can solve the problems of affecting the effect of voice change and low style similarity

Pending Publication Date: 2020-06-19

GUANGZHOU BAIGUOYUAN INFORMATION TECH CO LTD

View PDF6 Cites 2 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0003] At present, since the target voice that needs to be changed is usually determined in advance, for each target voice determined in advance, a large number of historical source voices will be used as training samples in advance to train the voice style conversion under the target voice, so that There are a large number of target voices that have been trained in the training set for voice style conversion, so that the accurate conversion of the source voice to a certain target voice in the training set can be realized later, while there are voice style conversions for other target voices that have not completed the training. Certain limitations, especially when a source speech is converted into another target speech that has a large style difference with the target speech that has been trained in the training set, the style similarity between the converted speech and the other target speech Low, so that there are certain defects in the voice style conversion under other target voices, thus affecting the voice change effect after the final conversion

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment 1

[0028] figure 1 It is a flowchart of a speech style conversion method provided by Embodiment 1 of the present invention. This embodiment is applicable to the case of performing audio voice change with a specific style and unchanged speech content on any speech. A voice style conversion method provided in this embodiment can be performed by the voice style conversion device provided in the embodiment of the present invention, which can be implemented by software and / or hardware, and integrated into the device that executes the method , the device may be a user terminal configured with any voice-changing application.

[0029] Specifically, refer to figure 1 , the method may include the following steps:

[0030] S110. Acquire a source-style speech, a target-style speech, and an initial conversion speech.

[0031] Specifically, in order to show users various voices in various voice styles, the audio voice changing technology set in the voice changing application is usually used...

Embodiment 2

[0045] Figure 2A It is a flowchart of a speech style conversion method provided by Embodiment 2 of the present invention, Figure 2B It is a schematic diagram of the principles of various speech losses during the calculation loss optimization process in the method provided by Embodiment 2 of the present invention. This embodiment is optimized on the basis of the foregoing embodiments. Specifically, such as Figure 2A As shown, this embodiment explains in detail the specific calculation process of speech content loss and speech style loss.

[0046] optional, such as Figure 2A As shown, the following steps may be included in this embodiment:

[0047] S210. Acquire source-style speech, target-style speech and initial conversion speech.

[0048] S220. Determine the speech content features of the source-style speech, the speech style features of the target-style speech, and the speech content and speech style features of the initial converted speech.

[0049] Specifically, ...

Embodiment 3

[0068] image 3 It is a flowchart of a speech style conversion method provided by Embodiment 3 of the present invention. This embodiment is optimized on the basis of the foregoing embodiments. Specifically, this embodiment explains in detail the specific process of initially converting speech for loss optimization.

[0069] optional, such as image 3 As shown, the following steps may be included in this embodiment:

[0070] S301. Acquire a source-style speech, a target-style speech, and an initial conversion speech.

[0071] S302, according to the speech content loss between the initial conversion speech and the source style speech and the speech style loss between the initial conversion speech and the target style speech, use the gradient descent algorithm to perform corresponding gradient sub-optimization on the gradient loss of the initial conversion speech, and obtain The new gradient loss.

[0072] Optionally, when performing loss optimization on the initial converte...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The embodiment of the invention discloses a voice style conversion method and device, equipment and a storage medium. The method comprises the steps: acquiring a source style voice, a target style voice and an initial conversion voice; according to the voice content loss between the initial conversion voice and the source style voice and the voice style loss between the initial conversion voice and the target style voice, carrying out loss optimization on the initial conversion voice to obtain a new initial conversion voice, continuing to carry out loss optimization until the new initial conversion voice meets a preset loss optimization condition, and taking the new initial conversion voice as a style conversion voice of the source style voice under the target style. According to the technical scheme provided by the embodiment of the invention, accurate conversion of the source style voice under the target style is realized, pre-training of voice style conversion does not need to be carried out for the target style, voice style conversion under the target voice which is not pre-trained is ensured, and comprehensiveness and accuracy of voice style conversion are improved.

Description

technical field [0001] Embodiments of the present invention relate to the technical field of voice change, and in particular, to a voice style conversion method, device, equipment, and storage medium. Background technique [0002] With the rapid development of intelligent voice technology, audio voice-changing technology has become a hot emerging technology, which aims to convert a certain source voice into a target voice with characteristic voice style and voice content unchanged, such as a The app plays a piece of audio recorded by the user with a voice-changing effect of a specific target. [0003] At present, since the target voice that needs to be changed is usually determined in advance, for each target voice determined in advance, a large number of historical source voices will be used as training samples in advance to train the voice style conversion under the target voice, so that There are a large number of target voices that have been trained in the training set ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G10L21/007G10L21/013G10L25/48

CPCG10L21/007G10L21/013G10L25/48G10L2021/0135

Inventor娄帆

OwnerGUANGZHOU BAIGUOYUAN INFORMATION TECH CO LTD

Voice style conversion method and device, equipment and storage medium

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment 1

Embodiment 2

Embodiment 3

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology