Multi-to-multi speech conversion method based on text encoder under non-parallel text conditions

A voice conversion and encoder technology, applied in voice analysis, voice recognition, voice synthesis, etc., can solve the problems of difficult implementation, difficult to determine the number of GMM clusters, and low quality, so as to improve voice quality and similarity, and improve general Sexuality and practicality, the effect of high-quality voice conversion

Active Publication Date: 2019-02-12
NANJING UNIV OF POSTS & TELECOMM
View PDF7 Cites 38 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, due to the over-regularization effect in the latent variables of VAE, which makes the distribution of latent variables too simplistic and difficult to represent the underlying structure of semantic content, the quality of this VAE-based non-parallel corpus-transformed speech is lower than that of parallel speech Quality of corpus-trained DNN conversions
If a more complex prior distribution of latent variables, such as GMM, is used to solve the problem, but because the semantic content varies greatly, it is not easy to determine the number of GMM clusters, so it is very difficult to implement
At present, the non-parallel corpus conversion speech method based on VAE has the disadvantages of poor speech quality and high noise after conversion.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Multi-to-multi speech conversion method based on text encoder under non-parallel text conditions
  • Multi-to-multi speech conversion method based on text encoder under non-parallel text conditions
  • Multi-to-multi speech conversion method based on text encoder under non-parallel text conditions

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0042] Such as figure 1 As shown, the high-quality speech conversion method of the present invention is divided into two parts: the training part is used to obtain the required model parameters and conversion functions for speech conversion, and the conversion part is used to realize the conversion from the source speaker's speech to the target speaker's speech convert.

[0043] The implementation steps of the training phase are:

[0044]1.1) Obtain the training corpus of non-parallel text, the training corpus is the corpus of multiple speakers, including the source speaker and the target speaker. The training corpus is taken from the VCC2018 speech corpus, and the non-parallel text training corpus of 4 male and 4 female speakers in the corpus is selected, and each speaker has 81 sentence corpus. The corpus also contains the semantic content of each sentence in the training corpus. The training corpus of source speaker and target speaker can be either parallel text or non-p...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a multi-to-multi speech conversion method based on a text encoder under non-parallel text conditions. The method includes a training part and a conversion part, a conditional variation self-encoder, the generative adversarial network (VAWGAN) and a text encoder (Text-Encoder) are used to implement a speech conversion system, sentence embedding representing semantics is added to the VAWGAN, the speech quality and personality similarity of the converted speech can be better improved, and high-quality speech conversion is realized. The method is advantaged in that dependence on parallel texts is relieved, speech conversion under the non-parallel text conditions is realized, not any alignment process is required in the training process, versatility and practicability ofthe speech conversion system are improved, the conversion system of multiple source / target speaker pairs can be integrated into one transformation model, that is, multi-talker-to-multi-talker conversion is realized, and the method has good application prospects in the fields of film dubbing, speech translation, speech synthesis and the like.

Description

technical field [0001] The invention relates to a voice conversion method, in particular to a text coder-based many-to-many voice conversion method under non-parallel text conditions. Background technique [0002] Voice conversion (Voice Conversion, VC) is a technique that converts source speaker features into target speaker features while preserving semantic information. In research in recent years, the VC model uses Deep Neural Networks (DNN) to convert source speech parameters into target speech parameters. Compared with the traditional Gaussian Mixture Model (GMM), DNN can be more effective transform voice features. [0003] Recently, Variational Auto-Encoder (VAE) has been used for non-parallel VC models because VAE is easier to train than restricted Boltzmann machines. In traditional VAE-based non-parallel VC, the encoder extracts speaker-independent latent variables representing semantic content from the input speech parameters, and then the decoder reconstructs the...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G10L15/06G10L25/18G10L25/30G10L21/003G10L17/04G10L13/02
CPCG10L13/02G10L15/06G10L15/063G10L17/04G10L21/003G10L25/18G10L25/30
Inventor 李燕萍石杨张燕
Owner NANJING UNIV OF POSTS & TELECOMM
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products