Speech conversion method based on asymmetric speech database conditions of speaker model alignment

A speaker model and speech conversion technology, applied in speech analysis, speech synthesis, instruments, etc., can solve problems such as multiple training sentences, inaccurate assumptions, and limiting the practical application of speech conversion technology.

Inactive Publication Date: 2014-12-17
SOUTHEAST UNIV
View PDF5 Cites 8 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, these methods still have great limitations: for example, the maximum likelihood constrained adaptive method needs to be pre-trained to obtain the conversion function of the reference speaker; the INCA method is based on the assumption that adjacent spectral features in the feature space correspond to the same phoneme However, this assumption is often not very accurate in practice, and this training method requires more training sentences; the speech conversion method based on speaker adaptation relies on a third-party speaker training model
Therefore, these problems limit the practical application of speech conversion technology under the condition of asymmetric speech library to a large extent.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Speech conversion method based on asymmetric speech database conditions of speaker model alignment
  • Speech conversion method based on asymmetric speech database conditions of speaker model alignment
  • Speech conversion method based on asymmetric speech database conditions of speaker model alignment

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0047] The technical solutions of the present invention will be further elaborated below in conjunction with the accompanying drawings and embodiments.

[0048] Speech conversion is a relatively new research direction in the field of speech signal processing, which has achieved considerable development in the past few decades. Research at home and abroad mainly focuses on the research of speech conversion based on symmetric speech corpus, but in practice, symmetric speech corpus is usually difficult to obtain directly. In view of this situation, the present invention proposes a new voice conversion method based on speaker model alignment under the condition of asymmetric speech database from the perspective of speaker model alignment. First, the models of the source speaker and the target speaker are trained separately; then, the speaker model is iteratively aligned using the mean and covariance parameters of the speaker model, so as to obtain the conversion function of the sp...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a speech conversion method based on asymmetric speech database conditions of the speaker model alignment. The method includes firstly training spectrum characteristics of an original speaker and target speaker respectively to acquire speaker modules, utilizing speaker module parameters to find the conversion function between the original speaker feature vector and auxiliary vector and the conversion function between the auxiliary vector and the target speaker feature vector, and finally utilizing the two conversion functions to figure out the conversion functions between the original speaker and the target speaker. During the speech conversion, the method of speaker model alignment is adopted, and the speech conversion effect is further improved by combining the speaker model alignment and a Gauss mixed model. According to the experimental result, the effects of spectrum distortion and relevance and speech conversion quality and similarity are better as compared with those of a traditional speech conversion method based on INCA.

Description

technical field [0001] The invention relates to a voice conversion technology, in particular to a voice conversion method under the condition of an asymmetric voice database, and belongs to the technical field of voice signal processing. Background technique [0002] Speech conversion refers to a technology that changes the voice personality of one speaker (source speaker) to that of another speaker (target speaker). Speech conversion technology has a wide range of application prospects, such as for personalized speech synthesis, speaker identity disguise in the field of secure communication, restoration of damaged speech in the medical field, and reception of speech in low-bit-rate speech communications. It is used to recover the speaker's personality characteristics, etc. [0003] In order to achieve high-quality speaker personality conversion, domestic and foreign scholars have proposed many speech conversion methods, such as codebook mapping method, Gaussian mixture mod...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G10L17/04G10L13/08
Inventor 宋鹏赵力金赟
Owner SOUTHEAST UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products