Many-to-many speech conversion system based on vae and i-vector under the condition of non-parallel text

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A voice conversion, non-parallel technology, applied in the field of signal processing, can solve the problem that the personality similarity of the converted voice is not ideal.

Active Publication Date: 2021-09-14

NANJING UNIV OF POSTS & TELECOMM

View PDF7 Cites 0 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

However, since the one-hot feature is only a speaker's identity label and does not carry rich personality information, the personality similarity of the transformed speech obtained by the VAE model based on the one-hot feature is not ideal, which is the main shortcoming of the algorithm. one

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment 1

[0027] see figure 1 and figure 2 , the present embodiment provides a many-to-many speech conversion system based on VAE and i-vector under non-parallel text conditions, which is divided into two steps of training and conversion:

[0028] 1 speaker speech training stage

[0029] 1.1 Obtain the training corpus. The speech library used here is VCC2018, which contains 8 source speakers and 4 target speakers. The training corpus is divided into two groups: 4 male speakers and 4 female speakers. For each fully trained speaker, 81 sentences are used as training corpus for full training, and 35 sentences are used as test corpus for model evaluation;

[0030] 1.2 Use the speech analysis and synthesis model WORLD to extract the speech features of each frame of the speaker's sentence: spectral envelope sp', speech logarithmic fundamental frequency logf 0 , the harmonic spectrum envelope ap, calculate the energy en of each frame of speech, and recalculate the spectrum envelope, ie sp...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The present invention provides a many-to-many speech conversion system based on VAE and identity feature vector (i-vector) under the condition of non-parallel text, and realizes speech based on the variational autoencoding model (Variational Autoencoding, VAE) under the condition of non-parallel corpus Conversion, in which the speaker's representation is added to the speaker's identity feature i‑vector, which can effectively improve the personality similarity of the converted voice. The advantages of the present invention include three aspects: 1) The dependence on parallel text is removed, and the training process does not require any alignment operation; 2) The transformation system of multiple source-target speaker pairs can be integrated in one transformation model, namely Realize many-to-many conversion; 3) The introduction of i‑vector features can enrich speaker identity information, thereby effectively improving the personality similarity of converted speech and improving conversion performance.

Description

technical field [0001] The invention belongs to the technical field of signal processing, and in particular relates to a many-to-many speech conversion system based on VAE and i-vector under the condition of non-parallel texts. Background technique [0002] After years of research on speech conversion technology, many classic conversion methods have emerged, including Gaussian Mixed Model (GMM), frequency bending, deep neural network (DNN), and methods based on unit selection. However, most of these speech conversion methods need to use parallel corpora for training to establish conversion rules between source speech and target speech spectral features. [0003] The speech conversion method based on the variational autoencoder (VAE) model is a system that directly uses the speaker's identity label to establish a speech conversion system. This speech conversion system does not need to analyze the speech frames of the source speaker and the target speaker during model training...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Patents(China)

IPC IPC(8): G10L21/013G10L25/18G10L25/21G10L25/30G10L13/02

CPCG10L13/02G10L21/013G10L25/18G10L25/21G10L25/30G10L2021/0135

Inventor 李燕萍许吉良张燕

Owner NANJING UNIV OF POSTS & TELECOMM

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Many-to-many speech conversion system based on vae and i-vector under the condition of non-parallel text

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment 1

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology