Real-time voice conversion method under conditions of minimal amount of training data

A technology of real-time speech and conversion methods, used in speech synthesis, speech analysis, speech recognition, etc.

Inactive Publication Date: 2010-06-23
NANJING UNIV OF POSTS & TELECOMM
View PDF0 Cites 46 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] At present, there have been no researches on how to perform speech conversion in the case of scarce tr

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Real-time voice conversion method under conditions of minimal amount of training data
  • Real-time voice conversion method under conditions of minimal amount of training data
  • Real-time voice conversion method under conditions of minimal amount of training data

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0096] The structure of the published speech conversion system is as follows: figure 1 shown. Horizontally, the system can be divided into two main parts: the training phase and the conversion phase. In the training phase, source and target voice data are collected, analyzed, feature parameters extracted, conversion rules learned and saved; in the conversion phase, the new source voice data to be converted is also collected, analyzed, and parameters are extracted, and then the training The transformation rules obtained in the stage are used on it, and finally all the transformed parameters are synthesized into speech through the speech synthesis module. Generally speaking, the training phase is a non-real-time phase, that is, an offline mode; and the conversion phase is a real-time phase, that is, an online mode. From a vertical perspective, the system can be divided into four major steps: signal analysis and synthesis, parameter selection and extraction, parameter alignment...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a real-time voice conversion method under conditions of minimal amount of training data. The method utilizes an ensemble learning theory to carry out modeling of a Gaussian mixture model to the collected data and design a mapping function under the rule of minimum mean square error. The method solves the problem that a standard GMM easily leads to over-fitting in the case of very minimal amount of data, and increases the robustness of a voice conversion algorithm for amount of data issues. At the same time, the GMM with more standard computational complexity is low in the process of estimating GMM parameters by the method, so the method is suitable for real-time voice conversion.

Description

technical field [0001] The present invention relates to a voice conversion technology (Voice conversion, VC), in particular to a real-time voice conversion method under the condition of a very small amount of training data, which is a voice conversion based on a statistical analysis model for text-to-speech conversion systems and robot vocalization systems The scheme belongs to the technical field of signal processing, especially speech signal processing. Background technique [0002] The knowledge field involved in this patent is called speech conversion technology, which is a new research branch in the field of speech signal processing in recent years, covering the core technologies of speaker recognition and speech synthesis, and combining them to achieve a unified goal. That is, while keeping the semantic content unchanged, by changing the voice personality characteristics of a specific speaker (called source speaker, Source speaker), what he (or she) said is considered ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G10L13/02G10L15/06
Inventor 徐宁杨震
Owner NANJING UNIV OF POSTS & TELECOMM
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products