Emotion speech synthesis method and device based on deep learning

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
A technology of speech synthesis and deep learning, applied in speech synthesis, speech analysis, instruments, etc.

Active Publication Date: 2020-01-10

SUNING CLOUD COMPUTING CO LTD

View PDF3 Cites 6 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0005] In order to solve the above technical problems, the present invention provides an emotional speech synthesis method and device based on deep learning, which can realize the synthesis of emotional speech without manually labeling emotions one by one

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment 1

[0078] Such as figure 1 As shown, the present embodiment provides a method for emotional speech synthesis based on deep learning, which belongs to the field of speech synthesis. Through this method, the synthesis of emotional speech can be realized without manual labeling of emotions, and the efficiency of synthetic speech emotion can be effectively improved. Naturalness.

[0079] combine figure 1 , 2 As shown, the method includes the following steps:

[0080] S1. Extract the text information to be processed and the preceding information of the text information to be processed.

[0081] Specifically, when the processing object is a text object, the previous information includes the previous text information;

[0082] When the processing object is a voice object or a video object, the previous information includes previous text information and previous voice information.

[0083] It should be noted that in this step, extracting text information from text objects, extractin...

Embodiment 2

[0127] In order to implement the method for emotional speech synthesis based on deep learning in the first embodiment above, this embodiment provides an apparatus 100 for emotional speech synthesis based on deep learning.

[0128] Figure 5 It is a schematic structural diagram of the deep learning-based emotional speech synthesis device 100, as Figure 5 As shown, the device 100 at least includes:

[0129] Extraction module 1: used to extract the text information to be processed and the previous text information of the text information to be processed, the previous text information includes the previous text information;

[0130] Emotional feature information generation module 2: used to generate emotional feature information through the pre-built first model by taking the text information to be processed and the preceding text information as input;

[0131] Emotional speech synthesis module 3: for synthesizing emotional speech through the pre-trained second model by taking ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses an emotion speech synthesis method and device based on deep learning, and belongs to the field of speech synthesis. The method at least comprises the following steps of extracting to-be-processed text information and preamble information of the to-be-processed text information, wherein the preamble information comprises preamble text information; generating emotional characteristic information through a pre-constructed first model by taking the to-be-processed text information and the preamble information as the input; and synthesizing emotional speech through pre-trained second model by taking the emotional characteristic information and the to-be-processed text information as the input. Through the method disclosed by the invention, on the basis of only acquiringthe text information, the synthesis of the emotional speech can be realized based on deep learning without manually marking emotion on each acoustic pronunciation in advance; the marking error can befurther reduced while the labor cost is reduced by the method, the suitability of the emotional information is improved, the speech emotional of the conservation is enriched, the naturalness and the smoothness of the synthesized speech are improved, the human-machine communication experience is improved, and the adaptability is wide.

Description

technical field [0001] The present invention relates to the field of speech synthesis, in particular to an emotional speech synthesis method and device based on deep learning. Background technique [0002] With the development of the current society, people hope that machines can replace human beings to perform some simple and frequent tasks, such as broadcasting and simple customer service work. We hope to have natural and harmonious communication with machines. Voice, as an important communication method in human society, largely determines the realization of this natural and harmonious communication between humans and machines. Therefore, speech synthesis has very important research significance in the field of affective computing and signal processing. The delicate emotional expression can greatly improve the naturalness of the synthesized speech. [0003] The existing practice is generally based on labeling information, manually labeling text, emotion, etc. for each a...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G10L13/02G10L13/08G10L25/27G10L25/63

CPCG10L13/02G10L13/08G10L25/27G10L25/63

Inventor钟雨崎

OwnerSUNING CLOUD COMPUTING CO LTD

Emotion speech synthesis method and device based on deep learning

What is AI technical title? AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document. A technology of speech synthesis and deep learning, applied in speech synthesis, speech analysis, instruments, etc.

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment 1

Embodiment 2

PUM

Abstract

Description

Claims

Application Information

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
A technology of speech synthesis and deep learning, applied in speech synthesis, speech analysis, instruments, etc.

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology