Multi-to-multi speech conversion method based on text encoder under non-parallel text conditions

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A voice conversion and encoder technology, applied in voice analysis, voice recognition, voice synthesis, etc., can solve the problems of difficult implementation, difficult to determine the number of GMM clusters, and low quality, so as to improve voice quality and similarity, and improve general Sexuality and practicality, the effect of high-quality voice conversion

Active Publication Date: 2019-02-12

NANJING UNIV OF POSTS & TELECOMM

View PDF7 Cites 38 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

However, due to the over-regularization effect in the latent variables of VAE, which makes the distribution of latent variables too simplistic and difficult to represent the underlying structure of semantic content, the quality of this VAE-based non-parallel corpus-transformed speech is lower than that of parallel speech Quality of corpus-trained DNN conversions

If a more complex prior distribution of latent variables, such as GMM, is used to solve the problem, but because the semantic content varies greatly, it is not easy to determine the number of GMM clusters, so it is very difficult to implement

At present, the non-parallel corpus conversion speech method based on VAE has the disadvantages of poor speech quality and high noise after conversion.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0042] Such as figure 1 As shown, the high-quality speech conversion method of the present invention is divided into two parts: the training part is used to obtain the required model parameters and conversion functions for speech conversion, and the conversion part is used to realize the conversion from the source speaker's speech to the target speaker's speech convert.

[0043] The implementation steps of the training phase are:

[0044]1.1) Obtain the training corpus of non-parallel text, the training corpus is the corpus of multiple speakers, including the source speaker and the target speaker. The training corpus is taken from the VCC2018 speech corpus, and the non-parallel text training corpus of 4 male and 4 female speakers in the corpus is selected, and each speaker has 81 sentence corpus. The corpus also contains the semantic content of each sentence in the training corpus. The training corpus of source speaker and target speaker can be either parallel text or non-p...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a multi-to-multi speech conversion method based on a text encoder under non-parallel text conditions. The method includes a training part and a conversion part, a conditional variation self-encoder, the generative adversarial network (VAWGAN) and a text encoder (Text-Encoder) are used to implement a speech conversion system, sentence embedding representing semantics is added to the VAWGAN, the speech quality and personality similarity of the converted speech can be better improved, and high-quality speech conversion is realized. The method is advantaged in that dependence on parallel texts is relieved, speech conversion under the non-parallel text conditions is realized, not any alignment process is required in the training process, versatility and practicability ofthe speech conversion system are improved, the conversion system of multiple source / target speaker pairs can be integrated into one transformation model, that is, multi-talker-to-multi-talker conversion is realized, and the method has good application prospects in the fields of film dubbing, speech translation, speech synthesis and the like.

Description

technical field [0001] The invention relates to a voice conversion method, in particular to a text coder-based many-to-many voice conversion method under non-parallel text conditions. Background technique [0002] Voice conversion (Voice Conversion, VC) is a technique that converts source speaker features into target speaker features while preserving semantic information. In research in recent years, the VC model uses Deep Neural Networks (DNN) to convert source speech parameters into target speech parameters. Compared with the traditional Gaussian Mixture Model (GMM), DNN can be more effective transform voice features. [0003] Recently, Variational Auto-Encoder (VAE) has been used for non-parallel VC models because VAE is easier to train than restricted Boltzmann machines. In traditional VAE-based non-parallel VC, the encoder extracts speaker-independent latent variables representing semantic content from the input speech parameters, and then the decoder reconstructs the...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): G10L15/06G10L25/18G10L25/30G10L21/003G10L17/04G10L13/02

CPCG10L13/02G10L15/06G10L15/063G10L17/04G10L21/003G10L25/18G10L25/30

Inventor 李燕萍石杨张燕

Owner NANJING UNIV OF POSTS & TELECOMM

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Multi-to-multi speech conversion method based on text encoder under non-parallel text conditions

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology