Cross-language timbre conversion system and method based on zero-order learning

A timbre conversion, cross-language technology, applied in speech analysis, speech recognition, instruments, etc., can solve the problems of poor language transfer performance, poor model generalization ability, insufficient semantic and timbre decoupling, etc., and achieve low difficulty. , easy to use, enhance the effect of robustness

Pending Publication Date: 2021-05-07
SOUTH CHINA UNIV OF TECH
View PDF7 Cites 10 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] The problem to be solved by the present invention is to provide a cross-language timbre conversion method based on zero-time learning, which solves the common problems of timbre conversion models such as GAN and VAE from three aspects. The extracted features are heavily dependent on the data distribution of the training set, resulting in poor generalization ability of the trained model. Second, the benchmark method does not fully decouple semantics and timbre, and does not explicitly separate the speech content from the speaker's timbre. , the third is that it is difficult for the benchmark method to retain the pronunciation habits of a specific language for cross-lingual and dialect-accented speech, and it performs poorly in language transfer
The present invention can extract high-level semantic features with the aid of a phoneme recognition model that highly strips speaker information, and convert it into the target speaker's voice in combination with the target speaker feature vector and fundamental frequency, and solves the problem that does not exist in the training data by using the phoneme feature as a model. Questions related to the language, so that the model can convert a variety of dialect accent speech, foreign language speech

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Cross-language timbre conversion system and method based on zero-order learning
  • Cross-language timbre conversion system and method based on zero-order learning
  • Cross-language timbre conversion system and method based on zero-order learning

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0044] In order to make the purpose, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below in conjunction with the drawings in the embodiments of the present invention. Obviously, the described embodiments It is a part of embodiments of the present invention, but not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.

[0045] A cross-language timbre conversion system based on zero-shot learning, including a phoneme recognition module, a timbre conversion module, a speaker encoding module, and a vocoder module;

[0046] The phoneme recognition module G p A hybrid neural network including 6-layer time-delay neural network and 2-layer long-short-term memory n...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a cross-language timbre conversion system and method based on zero-order learning. The system sequentially comprises a mixed phoneme recognition module, a timbre conversion module, a speaker coding module and a vocoder module. According to the system, a voice signal Mel spectrum serves as an input signal, bottleneck features of the voice signal Mel spectrum are extracted through the phoneme recognition module, the features are normalized and then transmitted to an acoustic model, the Mel spectrum synthesized by the acoustic model is controlled by controlling a speaker reference vector, and finally audio is synthesized through a vocoder. The system can convert the voice of a common speaker into the timbre of a specified speaker, is suitable for accent corpora which do not appear in a training database, can be suitable for voice change of dialects in multiple regions, and has a wide application prospect.

Description

technical field [0001] The invention belongs to the field of speech synthesis technology, in particular to a zero-shot learning-based cross-language timbre conversion system and method. Background technique [0002] Voice Conversion (Voice Conversion) technology is designed to convert the timbre of a voice into another specified person's timbre (B.Sisman, J.Yamagishi, S.King and H.Li,"An Overview of VoiceConversion and Its Challenges:From Statistical Modeling to Deep Learning," in IEEE / ACM Transactions on Audio, Speech, and Language Processing, vol.29, pp.132-157, 2021, doi:10.1109 / TASLP.2020.3038524.). It is a branch of the field of speech synthesis, but it is not only speech synthesis, but also involves technologies in speech recognition and speaker recognition related fields. One of the core issues of timbre conversion technology is how to decouple the content and timbre of the speech, which not only ensures the integrity of the speech content, but also ensures that the ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G10L21/013G10L25/87G10L25/24G10L25/30G10L15/02G10L15/06G10L15/07G10L19/02G10L19/26
CPCG10L21/013G10L25/87G10L25/24G10L19/0212G10L19/26G10L15/02G10L15/063G10L25/30G10L15/07G10L2021/0135G10L2015/025
Inventor 杨镇川张伟彬徐向民邢晓芬陈艺荣
Owner SOUTH CHINA UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products