A deep learning-based emotional speech synthesis method and device

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A technology of speech synthesis and deep learning, applied in speech synthesis, speech analysis, instruments, etc., to enrich the emotion of conversational speech, improve naturalness, and improve the experience of human-computer communication

Active Publication Date: 2022-07-05

SUNING CLOUD COMPUTING CO LTD

View PDF3 Cites 0 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0005] In order to solve the above technical problems, the present invention provides an emotional speech synthesis method and device based on deep learning, which can realize the synthesis of emotional speech without manually labeling emotions one by one

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment 1

[0078] like figure 1 As shown, this embodiment provides an emotional speech synthesis method based on deep learning, which belongs to the field of speech synthesis. Through this method, emotional speech can be synthesized without the need to manually mark emotions, and the emotion of synthesized speech can be effectively improved. naturalness.

[0079] combine figure 1 , 2 As shown, the method includes the following steps:

[0080] S1. Extract the text information to be processed and the preceding information of the text information to be processed.

[0081] Specifically, when the processing object is a text object, the preceding information includes preceding text information;

[0082] When the processing object is a speech object or a video object, the preceding information includes preceding text information and preceding speech information.

[0083] It should be noted that, in this step, extracting text information from text objects, extracting text information and vo...

Embodiment 2

[0127] In order to implement the deep learning-based emotional speech synthesis method in the first embodiment, this embodiment provides a deep learning-based emotional speech synthesis apparatus 100 .

[0128] Figure 5 A schematic diagram of the structure of the deep learning-based emotional speech synthesis device 100, such as Figure 5 As shown, the device 100 includes at least:

[0129] Extraction module 1: used to extract the text information to be processed and the preceding information of the text information to be processed, and the preceding information includes the preceding text information;

[0130] Emotional feature information generation module 2: used to generate emotional feature information through a pre-built first model with the text information to be processed and the preceding information as input;

[0131] Emotional speech synthesis module 3: used for synthesizing emotional speech through the pre-trained second model using the emotional feature informa...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a deep learning-based emotional speech synthesis method and device, belonging to the field of speech synthesis. The method at least comprises the following steps: extracting text information to be processed and preceding information of the text information to be processed, the preceding information including preceding text information; The text information to be processed and the preceding information are input, and the emotional feature information is generated by the pre-built first model; the emotional feature information and the text information to be processed are used as the input, and the emotional speech is synthesized by the pre-trained second model. On the basis of only obtaining text information, based on deep learning, the synthesis of emotional speech is realized without the need to manually label each acoustic pronunciation in advance. This method can further reduce the labeling error and improve the emotional information while reducing labor costs. It can enrich the emotion of dialogue speech, improve the naturalness and fluency of synthesized speech, improve the human-computer communication experience, and has wide adaptability.

Description

technical field [0001] The present invention relates to the field of speech synthesis, in particular to a deep learning-based emotional speech synthesis method and device. Background technique [0002] With the current social development, people hope that machines can replace humans to perform some simple and frequent tasks, such as broadcasting and simple customer service work. It is hoped that natural and harmonious communication with machines can be achieved. Voice, as an important communication method in human society, largely determines the realization of this natural and harmonious communication between humans and machines. Therefore, speech synthesis has important research significance in the fields of affective computing and signal processing. The delicate emotional expression can greatly improve the naturalness of the synthesized speech. [0003] The existing practice is generally based on the labeling information, and each acoustic pronunciation in each speech is...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Patents(China)

IPC IPC(8): G10L13/02G10L13/08G10L25/27G10L25/63

CPCG10L13/02G10L13/08G10L25/27G10L25/63

Inventor 钟雨崎

Owner SUNING CLOUD COMPUTING CO LTD

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

A deep learning-based emotional speech synthesis method and device

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment 1

Embodiment 2

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology