Method and device for synthesizing emotional speech of specific speaker under extremely low resources

A technology of speech synthesis and speaker, which is applied in the field of computer equipment and storage media, and specific speaker emotional speech synthesis, which can solve the problems of high cost and high cost

Pending Publication Date: 2020-09-04
SHENGZHI INFORMATION TECH NANJING CO LTD
View PDF9 Cites 3 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The advantage of the neural network synthesis method is that the processing is simple and the synthesis effect is very natural. The disadvantage is that a large data set is required to tr

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and device for synthesizing emotional speech of specific speaker under extremely low resources
  • Method and device for synthesizing emotional speech of specific speaker under extremely low resources
  • Method and device for synthesizing emotional speech of specific speaker under extremely low resources

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0031] In order to make the purpose, technical solution and advantages of the present application clearer, the present application will be further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present application, and are not intended to limit the present application.

[0032] Reference herein to an "embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the present application. The occurrences of this phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is understood explicitly and implicitly by those skilled in the art that the embodiments described herein can be combined with other embodiment...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a method and a device for synthesizing emotional speech of a specific speaker under extremely low resources, computer equipment and a storage medium. The method comprises the following steps: acquiring a training text and an audio corresponding to the training text, converting the training text into a phoneme sequence, embedding a slot with an emotion vector to obtain initial training data, inputting into a deep learning model, training to obtain a base model, acquiring a specific text and a specific audio, converting the specific text into a corresponding phoneme sequence, embedding a slot with an emotion vector into the phoneme sequence to obtain specific training data, inputting the specific training data into the base model, training to obtain a speech synthesismodel, converting a to-be-synthesized text into a phoneme sequence to obtain a to-be-synthesized phoneme sequence, filling the to-be-synthesized phoneme sequence into the emotion slot to obtain synthetic input data, and inputting the synthetic input data into the speech synthesis model to obtain a speech audio with specific emotion, so as to reduce the cost of obtaining the speech audio with thespecific emotion and improve the flexibility of an emotion speech synthesis scheme.

Description

technical field [0001] The present invention relates to the technical field of speech signal processing, in particular to a speech synthesis method, device, computer equipment and storage medium for specific speaker emotion under extremely low resources. Background technique [0002] Speech synthesis technology is to give computers (or various terminal devices) the ability to speak like humans, which is a typical interdisciplinary subject. TTS technology (also known as text-to-speech technology) belongs to speech synthesis, which is a technology that converts text information generated by the computer itself or externally input into intelligible and fluent speech output. Emotional speech synthesis is a research field that has only emerged in the past ten years. Compared with traditional speech synthesis, emotional speech synthesis takes into account the speaker's emotional state and speaking style, making the synthesized speech more intelligent and humanized, with more Wide...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G10L13/08G10L13/04G10L25/63
CPCG10L13/08G10L25/63
Inventor 袁熹
Owner SHENGZHI INFORMATION TECH NANJING CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products