Method for synthesizing emotional speech by utilizing transfer learning under low resources

A technology of speech synthesis and transfer learning, applied in speech synthesis, speech analysis, instruments, etc., can solve the problems of high acquisition cost and unconditional access to data sets, etc.

Pending Publication Date: 2020-11-17
TIANJIN UNIV
View PDF0 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Emotional speech synthesis under the premise of training with a large amount of data has reached an acceptable level, but in some special cases, it may not be possible to obtain a large amount of data for training, or obtain a cost relatively high

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for synthesizing emotional speech by utilizing transfer learning under low resources
  • Method for synthesizing emotional speech by utilizing transfer learning under low resources
  • Method for synthesizing emotional speech by utilizing transfer learning under low resources

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0045] The present invention will be further described in detail below with reference to the accompanying drawings and specific embodiments. It should be understood that the specific embodiments described herein are only used to explain the present invention, but not to limit the present invention.

[0046] This embodiment provides a method for emotional speech synthesis using transfer learning under low resources. In the actual operation of this embodiment, two data sets: EMOV-DB and LJSpeech-1.1 are used, wherein the EMOV-DB data set is low The emotional speech synthesis dataset of the resource, the text in the dataset is based on the CMU Arctic database. The dataset includes recordings of four speakers - two men and two women. Emotion types include neutral, sleepy, angry, disgusted, and entertaining. The LJSpeech-1.1 dataset is a single-person emotion-neutral speech synthesis dataset containing 13,100 short audio clips from a single speaker from 7 non-fiction books. Tran...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a method for synthesizing emotional speech by using transfer learning under low resources, which comprises the following steps of: 1, pre-training an emotional vector: traininga speech emotion recognition model by using an EMOVDB data set, wherein the speech emotion recognition model is obtained by further processing a style vector extraction part in a basic method GST + Tacotron2 model of stylized end-to-end speech synthesis; 2, pre-training a speech synthesis model: for a basic Tacotron2 model, carrying out pre-training by utilizing a data set of LJSpeech 1.1; and 3,carrying out transfer learning training: for the basic Tacotron2 model, connecting the intermediate result obtained in the step 1 to the result of the encoder, and carrying out transfer learning training. According to the method, a pre-training and transfer learning method is adopted, a small amount of emotion data of a single speaker can be fully utilized, and on the basis of a unified emotion speech synthesis model, the synthesized speech with the quality reaching a certain level and the obvious emotion tendency is synthesized.

Description

technical field [0001] The invention relates to the field of speech synthesis, in particular to a method for implementing emotional speech synthesis by using existing data for migration learning under low resources. Background technique [0002] In recent years, the field of end-to-end speech synthesis has developed rapidly. Under the premise of training on large data sets, the quality and clarity of speech synthesis have been greatly improved. Emotional speech synthesis under the premise of training with a large amount of data has reached an acceptable level, but in some special cases, it may not be possible to obtain a dataset with a large amount of data for training, or the acquisition cost relatively high. SUMMARY OF THE INVENTION [0003] The purpose of the present invention is to overcome the deficiencies in the prior art, and to provide a method for emotional speech synthesis using migration learning under low resources. The methods of learning and model pre-train...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G10L13/02G10L13/08G10L25/63
CPCG10L13/02G10L13/08G10L25/63
Inventor 王龙标徐杰党建武贡诚
Owner TIANJIN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products