Unlock instant, AI-driven research and patent intelligence for your innovation.

Voice style migration model training method and device, equipment and storage medium

A training method and style of technology, applied in speech analysis, biological neural network models, character and pattern recognition, etc., can solve the problem of lack of parallel speaker datasets for emotion datasets

Pending Publication Date: 2021-06-18
PING AN TECH (SHENZHEN) CO LTD
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] The main purpose of this application is to provide a training method, device, computer equipment, and computer-readable storage medium for a speech style transfer model, aiming at solving the lack of existing speech emotion data sets and parallel speaker data sets, which cannot be passed through a small amount of The technical problem of completing the training of the voice style transfer model on the emotional data set of speech and the parallel speaker data set

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Voice style migration model training method and device, equipment and storage medium
  • Voice style migration model training method and device, equipment and storage medium
  • Voice style migration model training method and device, equipment and storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0033] The following will clearly and completely describe the technical solutions in the embodiments of the present application with reference to the drawings in the embodiments of the present application. Obviously, the described embodiments are part of the embodiments of the present application, not all of them. Based on the embodiments in this application, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the scope of protection of this application.

[0034] The flow charts shown in the drawings are just illustrations, and do not necessarily include all contents and operations / steps, nor must they be performed in the order described. For example, some operations / steps can be decomposed, combined or partly combined, so the actual order of execution may be changed according to the actual situation.

[0035] Embodiments of the present application provide a speech style transfer model training method, device, comput...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to the technical field of artificial intelligence, and discloses a voice style migration model training method and device, equipment and a storage medium, and the method comprises the steps: obtaining a first update parameter based on a preset neural network model according to first Mel language spectrum information and second Mel language spectrum information; inputting the first Mel language spectrum information and the second Mel language spectrum information into a preset classifier to obtain a corresponding first style reward parameter; determining a first content reward parameter through the second Mel language spectrum information; acquiring a second update parameter according to the first style reward parameter and the first content reward parameter; and updating the model parameters of the preset neural network model through the first update parameter and the second update parameter to generate the corresponding voice style migration model, realizing audio-to-audio style migration, realizing fine-grained style migration by the classifier, in addition, completing the conversion from the source audio to the target audio from the two dimensions of style reward and content reward without collecting a large number of target audio corpora.

Description

technical field [0001] The present application relates to the technical field of artificial intelligence, and in particular to a training method, device, computer equipment, and computer-readable storage medium for a speech style transfer model. Background technique [0002] In recent years, with the success of neural networks, the development of text-to-speech (Text-To-Speech TTS) has also caught up with the fast train, basically realizing end-to-end speech synthesis. Various models based on the sound spectrum prediction network (Tacotron2) have improved the naturalness of the synthesized speech to a certain extent, but they lack the control over the speaker's rhythm and style. High requirements, and its subdivisions include style transfer, cross-language synthesis, etc. [0003] The so-called speech style transfer is to extract the characteristics of the speaker's timbre, style, emotion, etc., and then perform specific operations on the extracted feature vectors in the in...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G10L25/24G10L25/30G06K9/62G06N3/02
CPCG10L25/24G10L25/30G06N3/02G06F18/214
Inventor 孙奥兰王健宗程宁
Owner PING AN TECH (SHENZHEN) CO LTD