Unlock instant, AI-driven research and patent intelligence for your innovation.

Many-to-many speaker conversion method based on SE-ResNet STARGAN

A conversion method and speaker technology, applied in speech analysis, speech recognition, instruments, etc., can solve problems such as degradation, and achieve the effect of strengthening useful features, improving extraction capabilities, and enhancing representation capabilities

Pending Publication Date: 2020-07-17
NANJING UNIV OF POSTS & TELECOMM
View PDF5 Cites 4 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] Purpose of the invention: the technical problem to be solved by the present invention is to provide a method for many-to-many speaker conversion based on SE-ResNet STARGAN, further enhance the representation ability of the network, solve the problem of network degradation in the training process of the existing method, and reduce the The encoding network is difficult to learn semantic features, realizes the learning function of the deep spectrum features of the model, improves the spectrum generation quality of the decoding network, and fully learns semantic features and speaker's personalized features, so as to better improve the personality of the converted voice Similarity and Voice Quality

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Many-to-many speaker conversion method based on SE-ResNet STARGAN
  • Many-to-many speaker conversion method based on SE-ResNet STARGAN
  • Many-to-many speaker conversion method based on SE-ResNet STARGAN

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0042] In convolutional neural networks, convolution kernels capture local spatial relationships in the form of feature maps, and different channel features are further used with equally important weights, making globally irrelevant features propagate through the network, thereby affecting accuracy. In order to solve the above problems, the present invention builds a SE-ResNet network by adding SE-Net network (Squeeze-and-ExcitationNetworks, SE-Net) on the basis of ResNet, utilizes the independence between different channel features to model, and introduces The idea of ​​attention and the gating mechanism readjust the channel features of the output of the convolutional network, emphasizing useful features and suppressing useless features. While effectively solving the problem of network degradation, the representation ability of the model is further enhanced, thereby improving the spectrum of the decoding network. build quality. The present invention proposes a voice method ba...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a many-to-many speaker conversion method based on SE-ResNet STARGAN. A voice conversion system is realized based on the combination of STARGAN and SE-ResNet. On the basis of aresidual network, an attention thought and a gating mechanism are introduced to model the dependence of each channel, the weight of each feature channel is learned through global information, channel-by-channel adjustment is carried out on the features, useful characteristics are selectively enhanced, meanwhile, useless characteristics are inhibited, the representation capability of the model is further enhanced, meanwhile, the problem of network degradation in the training process can be effectively solved, the semantic learning ability of the model to the voice spectrum and the synthesis ability of the voice spectrum are well improved, so that the personality similarity and synthesis quality of the converted voice are improved, and the high-quality many-to-many voice conversion method under the non-parallel text condition is realized.

Description

technical field [0001] The invention relates to a method for many-to-many speaker conversion, in particular to a method for many-to-many speaker conversion based on SE-ResNetSTARGAN. Background technique [0002] Speech conversion is an important research branch in the field of speech signal processing, which is developed and extended on the basis of speech analysis, synthesis and speaker recognition. The goal of voice conversion is to change the voice personality of the source speaker so that it has the voice personality of the target speaker, that is, to make the voice spoken by one person sound like another person's voice after conversion, while preserving semantics . [0003] According to different training corpora, speech conversion can be divided into speech conversion under parallel text and non-parallel text conditions. In practical applications, it is difficult to obtain a large amount of parallel training corpus, especially in the field of cross-lingual and medica...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G10L15/08G10L15/16G10L15/18G10L15/06
CPCG10L15/08G10L15/16G10L15/1815G10L15/063G10L2015/0631
Inventor 李燕萍曹盼何铮韬
Owner NANJING UNIV OF POSTS & TELECOMM