Voice conversion method and device with emotion and rhythm

A speech conversion and prosody technology, applied in speech analysis, neural learning methods, biological neural network models, etc., can solve the problems of low naturalness of speech, complicated extraction engineering, limited speech conversion effect, etc., and achieve high speech quality and high similarity. degree of effect

Active Publication Date: 2020-11-03
SICHUAN CHANGHONG ELECTRIC CO LTD
View PDF9 Cites 10 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] Existing speech conversion methods have developed from parallel training data to non-parallel training data, and one-to-many transformation to many-to-many transformation. There are several ways to achieve it: one is to use a certain method to align the speech features and parameters of non-parallel corpora , and then train the model to obtain the speech conversion function. The corpus alignment work of this method is relatively complicated, and the speech conversion effect is relatively limited; one is to perform speech recognition on th

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Voice conversion method and device with emotion and rhythm
  • Voice conversion method and device with emotion and rhythm
  • Voice conversion method and device with emotion and rhythm

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0054] For ease of understanding, in this embodiment, the source speaker can be understood as himself, and the target speaker can be understood as a celebrity. This invention is used to transform one's own voice into that of a certain star.

[0055] This embodiment discloses a voice conversion method with emotion and rhythm, including a training phase and a conversion phase, such as figure 1 As shown, the training phase includes the following steps:

[0056] S11. Obtain training corpus of multiple speakers, including a source speaker and a target speaker;

[0057]Optionally, some existing high-quality public data sets can be used as training corpus, such as VCTK, LibriSpeech, etc., or self-recorded voice data containing multiple speakers.

[0058] S12. Performing acoustic feature extraction on the acquired training corpus;

[0059] Optionally, extract the Mel spectrum features from the training corpus. Specifically, the parameters are selected as follows: the window size is...

Embodiment 2

[0093] The voice conversion device with emotion and rhythm described in the embodiment of the present invention includes:

[0094] The acoustic feature extraction module is used for extracting acoustic features from the input speech.

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a voice conversion method with emotion and rhythm, which comprises a training stage and a conversion stage, and is characterized in that a style coding layer with an attentionmechanism is used for calculating a style coding vector of a speaker; and the style coding vector and the speaker voice acoustic features are input into a self-coding network with a box link togetherfor training and conversion, and finally the acoustic features are converted into audio through a vocoder. On the basis of a traditional voice conversion method, rhythm and emotion information of a speaker is introduced, so that the converted voice has emotion and rhythm of the voice of a target speaker, and the method has high similarity and high voice quality in many-to-many speaker voice conversion tasks such as intra-set pairing, intra-set pairing, extra-set pairing, extra-set pairing and the like.

Description

technical field [0001] The invention relates to the technical field of speech processing, in particular, a method and device for converting speech with emotion and rhythm. Background technique [0002] Voice conversion is a speech technology that retains the content information of the source speaker's voice and converts it into the target speaker's voice. This technology has a wide range of application scenarios. For example, users can convert their own voices into the voices of their favorite stars, as well as the "voice-changing bow" that anime fans talk about. Synthesis, voiceprint recognition, voiceprint security and other fields are of great significance. [0003] Existing speech conversion methods have developed from parallel training data to non-parallel training data, and one-to-many transformation to many-to-many transformation. There are several ways to achieve it: one is to use a certain method to align the speech features and parameters of non-parallel corpora ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G10L19/16G10L17/02G10L17/04G10L17/18G06N3/04G06N3/08
CPCG10L19/173G10L17/02G10L17/04G10L17/18G06N3/08G06N3/045
Inventor 朱海王昆周琳珉
Owner SICHUAN CHANGHONG ELECTRIC CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products