VAE (Variational Autoencoder)-based voice conversion method under non-parallel corpus training

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A speech conversion, non-parallel technology, applied in speech synthesis, speech analysis, instruments, etc., can solve the problems of sudden increase in calculation volume, inflexibility, inapplicability of computing resources and equipment, etc., to improve flexibility and improve training efficiency , the effect of improving usability and flexibility

Active Publication Date: 2018-11-09

NANJING UNIV OF POSTS & TELECOMM

View PDF12 Cites 41 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

However, there are still some defects in this type of algorithm. For example, the classic method of using Gaussian mixture model for speech conversion is mostly based on one-to-one speech conversion tasks, requiring the source speaker and the target speaker to use the same training sentence content. Only by aligning the spectral features with Dynamic Time Warping (DTW) frame by frame can the mapping relationship between spectral features be obtained through model training. Such a voice conversion method is not flexible enough in practical applications; the Gaussian mixture model is used to train the mapping function When considering global variables and iterating the training data, the calculation amount will increase sharply, and only when the training data is sufficient, the Gaussian mixture model can achieve a better conversion effect, which is not suitable for limited computing resources and equipment

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0049] The technical solutions of the present invention will be further elaborated below according to the drawings and in conjunction with the embodiments.

[0050] The present invention adopts the following technical scheme, a VAE-based speech conversion method under non-parallel corpus training, extracting the Mel cepstrum feature of the speech through the AHOcoder sound codec and combining it with the first-order difference and the second-order difference on the MATLAB platform The features are stitched together, and then the feature parameters of each frame before and after are stitched together to form a joint feature parameter x n ; put x n As the training data, the DNN network based on the speaker recognition task is used for training. After the network training is completed and the convergence is reached, the x n Input the DNN network frame by frame, and obtain the output of the Bottleneck layer of each frame, which is the Bottleneck feature parameter b containing the...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a VAE (Variational Autoencoder)-based voice conversion method under a non-parallel corpus training condition. Under a non-parallel text condition, Bottleneck features are extracted through a deep neural network, and then, based on a VAE model, learning and modeling of a conversion function are realized, and in a conversion stage, conversion from multiple speakers to multiple speakers can be realized. The VAE-based voice conversion method under non-parallel corpus training has the advantages in three aspects: 1) dependence on a parallel text is relieved, and any alignment operation is not needed during a training process; 2) multiple source-target speaker pair conversion systems can be integrated to one conversion model, and multi-to-multi conversion is realized; and3) the multi-to-multi conversion system under the non-parallel text condition can provide a techncial support for enabling the technology to go to actual voice interaction.

Description

technical field [0001] The invention belongs to the field of speech signal processing, and in particular relates to a speech conversion method based on a variational autoencoder (VAE) model under non-parallel corpus training. Background technique [0002] Speech conversion technology is a research branch of speech signal processing. It covers the fields of speaker recognition, speech recognition and speech synthesis. It intends to change the personalized information of speech while keeping the original semantic information unchanged, so that The speech of a particular speaker (ie, the source speaker) sounds like the speech of another particular speaker (ie, the target speaker). The main tasks of speech conversion include extracting the characteristic parameters of two specific speaker's voices and performing mapping transformation, and then decoding and reconstructing the transformed parameters into converted speech. In this process, it is necessary to ensure the accuracy o...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G10L13/02G10L13/08G10L21/007G10L19/02G10L25/24G10L25/30

CPCG10L13/02G10L13/08G10L19/02G10L21/007G10L25/24G10L25/30

Inventor 李燕萍凌云志

Owner NANJING UNIV OF POSTS & TELECOMM

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

VAE (Variational Autoencoder)-based voice conversion method under non-parallel corpus training

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology