Voice conversion method and device with emotion and rhythm

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
A speech conversion and prosody technology, applied in speech analysis, neural learning methods, biological neural network models, etc., can solve the problems of low naturalness of speech, complicated extraction engineering, limited speech conversion effect, etc., and achieve high speech quality and high similarity. degree of effect

Active Publication Date: 2020-11-03

SICHUAN CHANGHONG ELECTRIC CO LTD

View PDF9 Cites 10 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0003] Existing speech conversion methods have developed from parallel training data to non-parallel training data, and one-to-many transformation to many-to-many transformation. There are several ways to achieve it: one is to use a certain method to align the speech features and parameters of non-parallel corpora , and then train the model to obtain the speech conversion function. The corpus alignment work of this method is relatively complicated, and the speech conversion effect is relatively limited; one is to perform speech recognition on the speech data to be converted to obtain the recognized text, and then use the speech synthesis model of the target speaker to perform speech Synthesis, this method needs to rely on the development of speech recognition and personalized speech synthesis; there is another method that directly converts speech, extracting fundamental frequency features, speaker features and Content features, constructing a conversion function, but the feature extraction project of this method is more complicated, and the naturalness of the synthesized speech is low

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment 1

[0054] For ease of understanding, in this embodiment, the source speaker can be understood as himself, and the target speaker can be understood as a celebrity. This invention is used to transform one's own voice into that of a certain star.

[0055] This embodiment discloses a voice conversion method with emotion and rhythm, including a training phase and a conversion phase, such as figure 1 As shown, the training phase includes the following steps:

[0056] S11. Obtain training corpus of multiple speakers, including a source speaker and a target speaker;

[0057]Optionally, some existing high-quality public data sets can be used as training corpus, such as VCTK, LibriSpeech, etc., or self-recorded voice data containing multiple speakers.

[0058] S12. Performing acoustic feature extraction on the acquired training corpus;

[0059] Optionally, extract the Mel spectrum features from the training corpus. Specifically, the parameters are selected as follows: the window size is...

Embodiment 2

[0093] The voice conversion device with emotion and rhythm described in the embodiment of the present invention includes:

[0094] The acoustic feature extraction module is used for extracting acoustic features from the input speech.

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a voice conversion method with emotion and rhythm, which comprises a training stage and a conversion stage, and is characterized in that a style coding layer with an attentionmechanism is used for calculating a style coding vector of a speaker; and the style coding vector and the speaker voice acoustic features are input into a self-coding network with a box link togetherfor training and conversion, and finally the acoustic features are converted into audio through a vocoder. On the basis of a traditional voice conversion method, rhythm and emotion information of a speaker is introduced, so that the converted voice has emotion and rhythm of the voice of a target speaker, and the method has high similarity and high voice quality in many-to-many speaker voice conversion tasks such as intra-set pairing, intra-set pairing, extra-set pairing, extra-set pairing and the like.

Description

technical field [0001] The invention relates to the technical field of speech processing, in particular, a method and device for converting speech with emotion and rhythm. Background technique [0002] Voice conversion is a speech technology that retains the content information of the source speaker's voice and converts it into the target speaker's voice. This technology has a wide range of application scenarios. For example, users can convert their own voices into the voices of their favorite stars, as well as the "voice-changing bow" that anime fans talk about. Synthesis, voiceprint recognition, voiceprint security and other fields are of great significance. [0003] Existing speech conversion methods have developed from parallel training data to non-parallel training data, and one-to-many transformation to many-to-many transformation. There are several ways to achieve it: one is to use a certain method to align the speech features and parameters of non-parallel corpora ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G10L19/16G10L17/02G10L17/04G10L17/18G06N3/04G06N3/08

CPCG10L19/173G10L17/02G10L17/04G10L17/18G06N3/08G06N3/045

Inventor朱海王昆周琳珉

OwnerSICHUAN CHANGHONG ELECTRIC CO LTD

Voice conversion method and device with emotion and rhythm

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment 1

Embodiment 2

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology