VAE (Variational Autoencoder)-based voice conversion method under non-parallel corpus training

A speech conversion, non-parallel technology, applied in speech synthesis, speech analysis, instruments, etc., can solve the problems of sudden increase in calculation volume, inflexibility, inapplicability of computing resources and equipment, etc., to improve flexibility and improve training efficiency , the effect of improving usability and flexibility

Active Publication Date: 2018-11-09
NANJING UNIV OF POSTS & TELECOMM
View PDF12 Cites 41 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, there are still some defects in this type of algorithm. For example, the classic method of using Gaussian mixture model for speech conversion is mostly based on one-to-one speech conversion tasks, requiring the source speaker and the target speaker to use the same training sentence content. Only by aligning the spectral features with Dynamic Time Warping (DTW) frame by frame can the mapping relationship between spectral features be obtained throu

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • VAE (Variational Autoencoder)-based voice conversion method under non-parallel corpus training
  • VAE (Variational Autoencoder)-based voice conversion method under non-parallel corpus training
  • VAE (Variational Autoencoder)-based voice conversion method under non-parallel corpus training

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0049] The technical solutions of the present invention will be further elaborated below according to the drawings and in conjunction with the embodiments.

[0050] The present invention adopts the following technical scheme, a VAE-based speech conversion method under non-parallel corpus training, extracting the Mel cepstrum feature of the speech through the AHOcoder sound codec and combining it with the first-order difference and the second-order difference on the MATLAB platform The features are stitched together, and then the feature parameters of each frame before and after are stitched together to form a joint feature parameter x n ; put x n As the training data, the DNN network based on the speaker recognition task is used for training. After the network training is completed and the convergence is reached, the x n Input the DNN network frame by frame, and obtain the output of the Bottleneck layer of each frame, which is the Bottleneck feature parameter b containing the...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a VAE (Variational Autoencoder)-based voice conversion method under a non-parallel corpus training condition. Under a non-parallel text condition, Bottleneck features are extracted through a deep neural network, and then, based on a VAE model, learning and modeling of a conversion function are realized, and in a conversion stage, conversion from multiple speakers to multiple speakers can be realized. The VAE-based voice conversion method under non-parallel corpus training has the advantages in three aspects: 1) dependence on a parallel text is relieved, and any alignment operation is not needed during a training process; 2) multiple source-target speaker pair conversion systems can be integrated to one conversion model, and multi-to-multi conversion is realized; and3) the multi-to-multi conversion system under the non-parallel text condition can provide a techncial support for enabling the technology to go to actual voice interaction.

Description

technical field [0001] The invention belongs to the field of speech signal processing, and in particular relates to a speech conversion method based on a variational autoencoder (VAE) model under non-parallel corpus training. Background technique [0002] Speech conversion technology is a research branch of speech signal processing. It covers the fields of speaker recognition, speech recognition and speech synthesis. It intends to change the personalized information of speech while keeping the original semantic information unchanged, so that The speech of a particular speaker (ie, the source speaker) sounds like the speech of another particular speaker (ie, the target speaker). The main tasks of speech conversion include extracting the characteristic parameters of two specific speaker's voices and performing mapping transformation, and then decoding and reconstructing the transformed parameters into converted speech. In this process, it is necessary to ensure the accuracy o...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G10L13/02G10L13/08G10L21/007G10L19/02G10L25/24G10L25/30
CPCG10L13/02G10L13/08G10L19/02G10L21/007G10L25/24G10L25/30
Inventor 李燕萍凌云志
Owner NANJING UNIV OF POSTS & TELECOMM
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products